DOCUMENT 

24 



RESUME 



SP 002 422 



ED 027 281 



By -Brown, Bob Burton 

An Investigation of Observer-Judge Ratings of Teacher Competence. Final Report. 
Florida Univ., Gainesville. 

Spons Agency- Office of Education (DHEW), Washington, D.C. Bureau of Research. 



Bureau No-BR-5-1073 



Pub Date 31 Jan 69 
Contract-OEC-6-iO-288 
Note- 148p. 

EDRS Price MF-$0.74 HC-S7.50 ' 

Descriptors- Analysis of Variance, Attitudes, Attitude Tests, Behavior Rating Scales, Check Lists, Correlation, 
Educational Experiments, *Evaluation, Factor Analysis, Interaction Process Analysis, Interinstitutional 
Cooperation, *Lesson Observation Criteria, *Student Teachers, *Teacher Certification, *Teacher 
Qualifications 

Identifiers-CBRS, Classroom Behavior Rating Scale, Dogmatism Scale, D-Scale, PBI, Personal Beliefs Inventory, 
Teacher Evaluation Scale, Teacher Practices Inventory, Teacher Practices Observation Record, TES, TPI, 
TPOR 

Demonstrating and testing Conant’s recommendation that teacher competence 
should determine certification, this four-phase study from 1964-68 developed and 
field-tested procedures for evaluating teacher competence and for determining how 
evaluation is affected by the beliefs of student teachers and observer-judges. 
Following the Phase I orientation of observer-judges and evaluation of rating 
instruments and procedures, 539 observer-judges from colleges, public schools, and 
State Departments of Public Instruction rated 407 student teachers* clinical 
classroom performances over a one-year period (Phase II) with Teacher’s Classroom 
Behavior instruments. Prior to rating, students and observer-judges took three Study 
of Beliefs tests. Phase II data was statistically analyzed and compared with data 
from Phase III in which 100 Phase II subjects, then first-year teachers, and 100 
experienced teachers were evaluated. Data analysis in Phase IV revealed: predictable 
interrelationships among teacher beliefs, teacher competence, observer descriptions, 
and observer-judge beliefs; belief gaps between colleges of education and public 
schools; and theory-practice discrepancies in teachers and observer-judges. 
Recommendations for use of these findings in teacher evaluation programs are made. 
(LP) 



o 

ERIC 



1 

3 

i 



I— I 

CO 

r\j 

rv- 

rvj 




Q 

Ul 



I 

■ 

i 




U.S. DEPARTMENT OF HEALTH, EDUCATION & WELFARE 
OFFICE OF EDUCATION 



THIS DOCUMENT HAS BEEN REPRODUCED EXACTLY AS RECEIVED FROM THE 
PERSON OR ORGANIZATION ORIGINATING IT. POINTS OF VIEW OR OPINIONS 
STATED DO NOT NECESSARILY REPRESENT OFFICIAL OFFICE OF EDUCATION 
POSITION OR POLICY. 



< 34 - 

-a. V 




Final Report 



Project No. D-182 
Contract No. OE-6- 10-288 



AN IN VfcS 11 GATI ON OF 

TEACHER COMPETENCE 



OBSERVER-JUDGE RATINGS OF 



Bob Burton Brown 
University of Florida 
Gainesville, Florida 
January 31, 1969 



The research reported herein was performed pursuant to a 
contract with the Office of Education, U. S. Department of 
Health, Education, and Welfare. Contractors undertaking 
such projects under Government sponsorship are encouraged 
to express freely their professional judgment in the conduct 
of the project. Points of view or opinions stated do not, 
therefore, necessarily represent official Office of Education 
position or policy. 



U. S. DEPARTMENT OF 
HEALTH, EDUCATION, AND WELFARE 

Office of Education 
Bureau of Research 



» 






V 



l 



ACKNOWLEDGMENTS 



Sincere appreciation is expressed to the many teachers and observer 
judges whose cooperation made possible the collection of data for this 
study. For their assistance in organizing and supervising the collection 
of the data at the various teacher education institutions, we are in- 
debted to Dr. Leonard Kennedy, Dr. James Cochrane, Dr. Elizabeth Jalbert, 
Dr. Herbert Kliebard, Dr. George Beauchamp, Dr. Robert Maidment, Dr. 

Peter Oliva, and Mrs. Maude Watkins. 

Invaluable contributions to the project were rendered by research 
assistants, Dr. Jeaninne N. Webb, Dr. Tom R. Vickery, Mr. Winston 
Summerhill, Dr. Rodney Johnson, Dr. Herschel Shosteck, Dr. Julie Joyce 
Calory, and Mr. Wm. Nicholas S toff el. 

Appreciation is expressed to those who rendered services in the 
processing and analysis of the data, Dr. William Mendenhall, Dr. P. V. 

Rao, Dr. Fred Barnett, Mr. Robert Beaver, Mr. James T. McClave, Dr. 

Frank :Vickers , and Dr. Charles Bridges. 

Special contributions as consultants to the project were made by 
Dr* Norman Bowers, Dr. John Guy Fowlkes, Dr. George Beauchamp* and 
Dr. Nicholas Fattu. 

^ lor their participation in the initial conception and planning 
of the study, appreciation is due Dr. Merle Borrowman, Dr, Wilson Thiede, 
Dr. Lindley J. Stiles, Dr. Julian Stanley, Dr. William Cartwright, 

Dr. John X. Goodlad, Dr. B, J. Chandler, and Dr. James B. Conant. 

Finally, appreciation is expressed to Marion Terry and Vida 
Hamilton, secretaries, Tandel Brown, clerk— typist , and, above all, 

Marie Smallwood, for her skills as typist, accountant, secretary, 
coder and keeper of data, coordinator of staff, and loyal right arm. 



4 

ii 



o 

ERIC 



i mmmmmmm 






TABLE OF CONTENTS 



A 









k 



9 



ACKNOWLEDGMENTS . 

LIST OF TABLES AND FIGURES 

CHAPTER 

I. INTRODUCTION , 

The Problem 

Objectives . 

Research Foundation 



Page 

ii 



v 



1 

1 

3 

3 



II. RESEARCH PROCEDURES , , . . 6 



General Design 6 

Subjects 6 

Instrumentation 7 

Data Collection ..... 8 

Evaluation of Data 9 



III. PHASE I - PILOT STUDY H 

Purposes * U 

Subjects 11 

Data Collection Procedures 11 

Reliability Estimates - Teacher Practices 

Observation Record 12 

Analysis of Data 12 

Findings - TPOR scores 13 

TPOR Reliability Summary 22 

Reliability Estimates - Classroom Behavior Rating 

Scale 22 

Analysis of Data 22 

Findings - CBR5 Scores 23 

CBRS Reliability Summary 25 

Comparison of the Teacher Practices Observation 

Record and The Classroom Behavior Rating Scale . 26 

Identification of Predictor Variables. , 28 

Teacher Practices Observation Record 28 

Classroom Behavior Rating Scale 36 

IV. PHASE II r- OB SERVER- JUDGE RATINGS OF STUDENT TEACHERS 48 

Purposes 48 

Procedures 48 

Analysis of Data - Using Total Scores of 

Observations and Evaluations 49 

Findings of Total Score Analysis ... 56 

Discussion of Total Score Analysis . . . , ... 56 

Summary of Total Score Analysis. 62 



iii 




* 



■tail 









■ 








mem «■ 1 mrne mSESg&m &rm * 






< g ww wwyg 



CHAPTER Page 

Re-examination of the Data Using Factor Scores , . 62 

Analysis of Data Using Factor Scores 63 

Findings of the Factor Scores 64 

Discussion of the Factor Scores 66 

Summary of Factor Score Analysis . 69 

V. PHASE III (FOLLOW-UP STUDY) 71 

Purposes * • • 71 

Procedures 71 

Data Collection ......... 72 

Analysis of Data 72 

Findings . 74 

Discussion * ♦ . , . 74 

Conclusions . . . . . 81 

VI. COMPARISON ANALYSIS OF PHASE II AND PHASE III DATA * 82 

VIZ. SUMMARY, CONCLUSION, AND RECOMMENDATIONS 87 

Summary of Findings 88 

Conclusions ..... 93 

Recommendations 95 

APPENDIX A 98 

APPENDIX B HO 

APPENDIX C 126 



LIST OF TABLES 



TABLE Pa 8 e 

1. Mean TPOR Scores Given Five Films by All Observers ... 13 

2. Mean TPOR Scores Given Films on Repeated Observations 

One Year Apart 14 

3. Correlation of TPOR Scores Obtained from Repeated 

Observations of Films . 15 

4. “Within-Observer" Reliability Coefficients for TPOR 

Scores on Repeated Viewings of Films 21 

5. TPOR Internal Consistency Reliability Coefficients ... 21 

6. Mean CBRS Scores for Films 23 

7. "Within-Observer" Reliability Coefficients for CBRS 

Scores on Repeated Viewings of Films 24 

8. CBRST and CBRSP Internal Consistency Reliability 

Coefficients 25 

9. The Relationship Between TPOR Means and Evaluative 

Ratings of Five Filmed Teaching Episodes 27 

10. CBRS Means and Evaluative Ratings 28 

11. Definition of Variables 29 

12. Interaction Model - TPOR - Summary of F-tests to 

Determine Significance of Variables in the Model ... 32 

13. Summary of F-tests to Determine Significance of 

Variables in the Model - TPOR - Order 1 33 

14. Summary of F-tests to Determine Significance of 

Variables in the Model - TPOR - Order 3 34 

15. Final Model for TPOR, Analysis of Variance, Summary . . 35 

16. Interaction Model - CBRSP - Summary of F-tests to 

Determine Significance of Variables in the Model ... 37 

17. Summary of F-tests to Determine Significance of 

Variables in the Model - CBRSP - Order 1 38 

18. Summary of F-tests to Determine Significance of 

Variables in the Model CBRSP — Order 3 39 

v 



-- m ■ ft 



TABLE Pa S e 

19. Final Model CBRSP, Analysis of Variance, Summary .... 41 

20. Summary of F-tests to Determine Significance of 

Variables in the Model — CBRST — Order 1 ...... • 42 

21. Summary of F-tests to Determine Significance of 

Variables in the Model - CBRST - Order 3 43 

22. Final Model CBRST, Analysis of Variance, Summary .... 44 

23. Summary of F-tests to Determine Significance of 

Variables in the Model - CBRS - Order 1 . 45 

24. Summary of F-tests to Determine Significance of 

Variables in the Model - CBRS - Order 3 * - * - - * • 46 

25. Final Model CBRS, Analysis of Variance, Summary .... 47 

26. Definition of Variables 51 

27. Variables Which Contribute to Variance in Observation 

of Classroom Behavior 57 

28. Variables Which Contribute to Variance in Evaluations 

of Teaching Competence 58 

29. F-values of Theoretical and Empirical Factors ..... 63 

30. Teacher Practices Observation Record (TPOR) Regression 

Theoretical Factors » 64 

31. Teacher Practices Observation Record (TPOR) Regression 

Empirical Factors 64 

32. Teacher Evaluation Scale (TES) Regression 

Theoretical Factors 65 

33. Teacher Evaluation Scale (TES) Regression 

Empirical Factors . . 65 

34. Variables Which Contribute to Variance in Observation 

of Classroom Behavior . 75 

35. Variables Which Contribute to Variance in Evaluations 

of Teaching Competence 76, 77 

36. Results of Paired Difference t-tests Comparing Phases 

II and III 83 

37. Multiple Regression - Model 1 8 5 

38. Multiple Regression - Model 2 86 

vi 



CHAPTER I 



INTRODUCTION 



The purpose of this study was to field test the use of judgments 
of teacher competence in classroom performances as the potential basis 
for teacher certification. In order to do this the project demonstrated 
and evaluated a number of ways in which both academic and education 
professors, supervisors of student teaching, cooperating public school 
teachers and administrators, and State Department of Public Instruction 
personnel may be brought together in teams to observe classroom teaching 
performances and to judge competence for teaching. 

The Problem 

As a result of a two-year study of teacher education and certifi- 
cation policies, James B. Conant concluded; "The policy of certifica- 
tion based on the completion of state-specified course requirements is 
bankrupt. Conant pointed out that completion of specific academic 
and professional courses or programs approved by a national accrediting 
agency "cannot be enforced in such a manner that the public can be 
assured of competent teachers, and they involve the states in acrimonious 
and continuous political struggles, which may not serve the public in- 
terest. Consequently, Dr. Conant suggested the need for alternative 
programs and policies which rely primarily on the use of judgments of 
competence of classroom teaching performance as the basis for teacher 
certification. 

The first and most central of the 27 recommendations by Conant 
concerns' certification based on evidence of competence: 



For certification purposes the state should require only 
(a) that a candidate hold a baccalaureate degree from a legiti- 
mate college - or university, (b) that he submit evidence of 
having' successfully performed as a student under the direction 
of college and public school personnel in whom the State 
Department has confidence, and in a practice-teaching situa- 
tion of which the State Department' approves , and (c) that he 
hold' a specially endorsed teaching certificate from a college 
or university which, in issuing the official document, attests 
that the* institution as a whole considers the person 
prepared to teach in' a' designated field and grade levex. 



•*-James B . Conant , * The Education of American Teachers (New York : 




McGraw Hill, 1963), p. 54. 



^ Ibid . , p . 55 . 
3 Ibid . , p. 60. 






***** ^y*#*'****F*™ 



imrey#*#?** -.r-v •v*******' '(••#«• '.’Vp «*•>•• v^* , i^^«-*r:*-r>'>' *■•■'• *■*» >• ' - •• ’ - 



Conant’s proposal suggests that both academic and pedagogical professors, 
supervisors of student teaching, cooperating public school teachers and 
administrators, State Department of Public Instruction personnel, and 
possibly others be brought in to evaluate practice teaching and to 
judge the candidate's mastery of the subject he teaches, his utiliza-. 
tion of educational knowledge, his mastery of the techniques or teaching, 
and his possession of the intellectual and personality traits relevant 
I-q effective teaching.^ The Conant plan also calls j.or teacher educa 
tion institutions, in conjunction wit v public school systems, to estab- 
lish a state-approved practice- teaching arrangement, and stipulates 
that public school systems which enter contracts with teacher education 
institutions for practice teaching should designate as classroom 
teachers - working’ with practice teaching only those persons in whose 
competence as teachers , leaders , and evaluators they have the highest 
confidence. 

This study was proposed to demonstrate and field test Conant *s 
plan... The nationwide implications of ’ this plan for teacher education 
and certification are clear to those concerned with quality teaching. 

It deserves a wide-scale demonstration’ to determine if it can be made 
workable. It should not be rejected out of hand or’ be installed as 
the prototype in the education and' certification of teachers without a 
thorough trial and evaluation. 

In reaction to the Conant recommendations a number of problems 
involved in basing' certification on judgments of teacher competence 
can be pointed' out. For - example: 

1. There is widespread skepticism among professional 
educators that alternative teacher education programs 
which rely on use of judgments of classroom teaching 
quality can' be (a) practical, (b) reliable, or (c) ac- 
ceptable to State Departments of Education as a basis for 
legal' certification of teachers. 

2. Related to this skepticism, it is generally assumed that 
lack of agreement regarding' what should be the criteria for 
"good" teaching constitutes an insurmountable roadblock to 
basing certification on demonstrated competence. 

3. There is also a popular belief that judgments of teaching 
competence must somehow be "objective," and that rating 
procedures which involve subjective value judgments are 
unreliable and dangerous. 

4. Established teacher education and certification practices 
have been: strongly influenced by the notion that there is 
or should be "one best"’ definition of good teaching and 
"one best" plan' for' the' preparation of teachers. 



^Ibid. , p. 62. 
^ Ibid. , p„ 63. 



2 



Problems of this sort had to be dealt with in order to give the 
Conant recommendations any kind of a fair trial. Therefore, considera- 
tion of them was included in the objectives of the study. 



Objectives 

The general objectives of this study were: 

A. To demonstrate and evaluate the use of' judgments of 
competence - in classroom - performances as - a - potential 
basis for' certifying teachers. 

B. To make wider use of and test out in practical field 
situations basic research knowledge of various processes 
related to judging teacher competence from observations 
of classroom performance, and, if possible, add to that 
knowledge . 

C. To develop and test procedures by which observer- judges 
can evaluate teaching behavior using individual criteria 
identifiable in terms of measured positions on relevant 
value continua. 

D. To involve both academic and pedag )gical scholars in 
an all-university approach to the process of evaluating 
the qualifications of candidates for teacher certification. 

E. To develop working partnerships between teacher education 
institutions, State Departments of Public Instruction, 

and local school systems that involve both shared responsi- 
bility for teacher education and cooperative judgments of 
candidates’ qualifications for teaching. 

F. To provide descriptions of variation and central tendencies 
in the performance' of student teachers. 

G. To provide descriptive information about observer- judge 
ratings of teacher competence, including the identifica- 
tion of • factors influencing their' reliability and validity, 
as well as the' variation' and central tendencies of their 
observations and evaluations. 



Research' Foundation 

of James B. 
of the educa- 
not qualify 



Basic to this demonstration are the recommendations 
Conant, which are' the- product' of his • recent two-year study 
tion of American' teachers. * Although' the' Conant study may 



6 Ibid. 



3 



* r mmm mmmm i mmmmto mi 



Mr AH*. 



as basic research, it does represent a painstaking survey and penetrating 
assessment: of the: complex and controversial issues involving the educa- 
tion and certification of teachers conducted by a most respected scholar, 
scientist, and statesman, who was assisted by a strong staff of qualified 
educators. Dr. Conant’s recommendations are rooted in the findings of 
a study of current facts and issues, which is considerably more than 
can be said for the rooted-in-tradition policies and procedures his 
recommendations are designed to displace. 



With respect to the demonstration and evaluation of procedures 
for observing - classroom performance and judging teaching competence, 
there. is, fortunately, a considerably stronger research foundation. 

Host pertinent to the implementation of the proposed program is the 
vast amount of * research' on teaching drawn together under the editor- 
ship if N.L. Gage. ^ There are available a wide variety of experimen- 
tally tested procedures for measuring classroom behavior by systematic 
observation,® for rating competence in teaching, ^ for analyzing teach- 
ing methods, u for analyzing the teacher’s personality and characteris- 

tics,H and for measuring cognitive-^ and noncognitive variables^ in 

research on teaching j which will be selected and used in this program. 



^N. L. Gage (ed.), Handbook of Research on Teaching (Chicago: 
Rand McNally & Co . , 1963). 

Q 

Donald M. Medley and Harold E. Mitzel, "Measuring Classroom 
Behavior by Systematic Observation-, 11 Handbook of Research on Teaching , 
N. L. Gage (ed.) (Chicago : Rand McNally & Co., 1963), pp. 247-328. 

Q 

H. H. Remmers, "Rating Methods in Rest .rch on Teaching," 
Handbook - of Research on Teaching , N. L. Gage (ed.) (Chicago: Rand 
McNally & Co., 1963), pp. 329-378. 

■^Norman E. Wallen and Robert M. W. Travers, "Analysis and 
Investigation of Teaching Methods," Handbook of Research on Teaching , 

N. L. Gage (ed.) (Chicago: Rand McNally & Co., 1963), pp. 448-505. 

■^J. W. Getzels and D. W. Jackson, "The Teacher’s Personality and 
Characteris tics v"' Handbook ' of Research - on r Teaching , N; L. Gage (ed.) 
(Chicago: Rand McNally & Co. , 1963) , pp; 506-582. 

•^Benjamin' S. Bloom', "Testing Cognitive Ability and Achievement," 
Handbook ' of - Research ’ on Teaching , N.L. Gage (ed.) (Chicago: Rand 
McNally & Co., 1963), pp; 378-397. 

13 

George G. Stern, "Measuring Noncognitive Variables in Research 
on Teaching;"' Handbook ' of' Research ' on Teaching . N. L. Gage (ed.) 
(Chicago: Rand McNally & Co. , 1963) , pp. 398-447. 



4 



This investigation, while not designed primarily as a basic 
research study, was compelled, in making intelligent use of available 
procedures for measuring and judging * teacher behavior, to draw upon 
and add some increment to the basic research knowledge in this area. 
The observation, judging, and evaluation phases of the program were 
conducted with as much research rigor as possible under the circum- 
stances, Likewise, the data collected were submitted to the most 
strenuous statistical analyses that could be found or developed. 






5 



* ‘M, JUK 



CHAPTER II 



RESEARCH PROCEDURES 



General Design 

The study was carried out in four phases over a period of four 
years (1964-1968). Phase I involved the selection, organization, and 
orientation of observer- judges , as well as the selection and evaluation 
of the observational and rating instruments and procedures. Phase II 
involved observer- judges making multiple and repeated observations and 
judgments of student teachers’ classroom performances in pre-service 
clinical experiences. Phase III was a follow-up study involving 
observations and judgments of a sample from Phase II subjects during 
their first year of service as certified teachers. Phase IV was 
concerned with analysis and evaluation of the data, and the prepara- 
tion of the final report. 



Subjects 

The prospective teachers observed and judged were drawn from 
those students enrolled in teacher education programs at Sacramento 
State College in California, the University of Florida at Gainesville, 

New York State University at Albany and at Oneonta, Northwestern 
University in' Evanston, Illinois, and the University of Wisconsin in 
Madison. The teams of observer- judges were selected from the faculty 
of these colleges and cooperating public school systems under the 
supervision and approval of the State Department of Public Instruction 
in each of the five states involved. 

Six populations or groups of subjects were involved in this study: 

(1) Observer- judges of filmed teaching episodes (Raters^). 

(2) Observer- judges of pre-service teaching performances (Raters^). 

(3) Observer-judges of in-service teaching performances (Raters 3 ). 

(4) Five master teachers recorded on film (Ratees^) . 

(5) Pre-service student teachers (Ratees 2 ) . 

(6) In-service teachers (Ratees^) . 

Raters, ' consisted of ' college supervisors of student teaching, 
education professors', ' and' professors of academic subjects from four of 
the six teacher' education' institutions,’ Sacramento, Albany, Northwestern, 
and Wisconsin. 



Raters 2 consisted of student teacher supervisors, education pro- 
fessors, and academic professors from all six colleges, plus cooperating 
teachers and principals from the public schools and, in a few cases, 
personnel from State Departments of Public Instruction. 

Raters included not only members of the raters group but also 
teachers, supervisors, and administrators from the schools where the 
ratees 2 were undergoing their initial in-service experience. Although 
membership of the rater groups varied from phase to phase there were 
individual observer- judges who participated in all three rater groups. 

Ratees^ consisted of five master teachers whose teaching had been 
recorded on film. These teachers were members of the faculty of Wisconsin 
High School (a private school operated by the University of Wisconsin 
until 1962) in 1959 and 1960 when the films were made. They were no 
longer available for study beyond their performances as teachers on the 
films . 

Ratees- was comprised of a sample of about 500 elementary and 
secondary student teachers from the six teacher education institutions. 
This was the group observed and evaluated in Phase II. 

Ratees-^ was comprised of 100 first-year teachers selected from 
the Ratees ' group , plus 100 experienced teachers who were selected 
from the schools in which the first-year teachers were employed. 



Instrumentation 



Two sets - of instruments were used in the collection of data for 
this study: the first set, the Study of Beliefs , was used to assess 

the beliefs of all raters and ratees; the second set, the Teacher’s 
Classroom ’ Behavior , was used by the raters to observe, record, and 
evaluate the teaching performances of the ratees . The Study of Beliefs 
was comprised of three instruments: the Personal Beliefs Inventory 

(PBI) , ’ the 1 Teacher Practices Inventory (TPI) , and the Dogmatism Scale 
(D-Scale) . The' Teacher 1 s Classroom Behavio r included three instruments: 
the' Teacher ' Practices Observation Record (TPOR) , the Classroom Behavior 
Rating Scale (CBRS), and the Teacher Evaluation Scale (TES). Each of 
these instruments is described as follows: 



Personal Beliefs Inventory . The Personal Beliefs Inventory (PBI) 
is a yardstick by which agreement-disagreement with the basic philosophy 
of John Dewey may be measured. A high score on this inventory indicates 
that one’s beliefs concur with Dewey's fundamental philosophic beliefs. 
Reliabilities reported for the PBI vary from .55 to .78.^ 



1 

Bob Burton Brown, ’ The ’ Experimental Mind in Education (New York: 
Harper and Row, 1968)’, Chapter VI. 



7 










• '' 






3 



m 



m i 



* mmmrn mrnm mt 



Teacher Practices Inventory . The Teacher Practices Inventory (TPI) 
measures agreement-disagreement with Dewey’s educational philosophy. A 
high score indicates concurrence with Dewey’s beliefs about what teachers 
should do in classrooms. Reliability coefficients reported for the TPI 
range from .56 to .94. 

Dogmatism Scale . The Dogmatism Scale (D— Scale) measures the 
structure of belief systems along an open and closed dimension. Reli- 
abilities of .68 to .93 have been reported for the D-Scale. 

Teacher Practices Observation Record . The Teacher Practices 
Obf ervation Record (TPOR) is a sign system for recording teacher prac- 
tices observed in a classroom. It measures the agreement-disagreement 
of teachers' observed classroom behavior with educational practices advo- 
cated by Dewey in his philosophy of experimentalism. A high score on 
the TPOR indicates that the recorded behavior was observed as practices 
which Dewey advocated. Establishment of reliability of the TPOfe. is 
reported in Phase I of the study. 

Classroom Behavior Rating Scale . The Classroom Behavior Rating 
Scale (CBRS) is a scale on which descriptive dimensions of teacher and 
pupil behavior are rated on a six-point continuum. This scale was 
developed from rating instruments used by Ryans and McGee in earlier 
studies. 

Teacher Evaluation Scale . The Teacher Evaluation Scale (TES) 
is a six-point scale (enlarged in Phase III to an eighteen-point scale 
in order to permit observers to make finer discriminations) . The TES 
is an instrument which enables the rater to evaluate the competence 
of the teacher observed. 

These instruments may be found in Appendix A and B. 



Data Collection 



Phase I . In the spring of 1965, viewing sessions of filmed 
episodes of teaching behavior were held at four of the participating 
teacher training institutions. During these sessions observer- judges 
(raters^) were acquainted with the use of the TPOR, making recorded 
observations of each of five filmed episodes. Twelve months later 



Ibid . 

3 

Milton Rokeach, The Open and Closed Mind (New York: Basic 
Books, Inc., 1960). 

A 1 

David G. Ryans, Characteristics of Teachers (Washington, D.C.: 

The American Council on Education, 1960). 

H. M. McGee, "Measurement of Authoritarianism and Its Relation 
to Teacher's Classroom Behavior," Genetic Psychology Monograph, 52:89-146. 
1955. 
















m 



raters^ viewed two of these films , repeating the Teacher 1 s Classroom 
Behavior observations. Prior to the initial viewing sessions all 
observer- judges completed the Study of Beliefs . 

Phase II . To the rater^ group additional personnel (cooperating 
teachers and administrators) were added for Phase II; these observer- 
judges comprised rater 2 group. The ratees 2 group was composed of 
student teachers completing their pre-service clinical teaching 
experiences in the spring of 1966. Both the groups of observer- 
judges and student teachers completed the Study of Beliefs . During 
the student teaching experience each pre-service teacher’s classroom 
behavior was repeatedly observed by teams of observer- judges using the 
Teacher’s Classroom Behavior as an observation and rating instrument. 

Phase III . Additional ratees (experienced teachers) and raters 
(public school personnel) were added as subjects in Phase III. The 
Study of Beliefs was completed by each of the added personnel. Each 
of the ratees^ was observed by a team of three observer- judges 
systematically during the winter and spring of 1967. 



Evaluation of Data 



Phase I . The scores from the Study of Beliefs and Teacher ' s 
Classroom Behavior were analyzed to: 

(1) develop a design for estimating reliability coefficients for 
the recorded observations of the observer- judges 

(2) identify variables that could be used in predicting the 
observation scores and ratings given a teacher by an 
observer- j udge . 

Phase II . Phase II data were analyzed to identify: 

(1) the relationships between the beliefs and observed 
practices of the ratees 2 

(2) the relationships between beliefs, observations, and 
evaluations of the raters^ 

(3) the interaction of these relationships 

(4) variables which contribute information to the prediction 
of ratings given teachers by observer- judges. 

Phase III . Phase III data were analyzed to: 

(1) identify the relationships between the beliefs and observed 
practices and evaluations of ratees^ 

(2) identify the relationships between the beliefs, observations 
and evaluations of raters ^ 



9 







(3) identify the interactions between these relationships 

(4) compare the raters 2 ’ and raters ’ observation scores and 
evaluation scores of pre-servici teachers (ratees ? ) to 

those of the same individuals as first-year teachers (rateeso) 

(5) compare the observations and evaluations of first-year 
teachers to those of experienced teachers given by raters ~ 

(6) identify characteristics of raters which become predictive 
of ratings given certain characteristics and behavior of 
ratees 

(7) identify characteristics and behavior of ratees which become 
predictive of ratings of teacher competence 

(8) identify the relationships of ratings in Phase II with ratings 
in Phase III for (a) all ratees (b) ratees given extremely 
good ratings compared with ratees given extremely poor ratings. 



CHAPTER III 



mm 



mtr 







PHASE I - PILOT STUDY 



Purposes 

The objectives of Phase I were (1) to select and orient the 
observer- judges who were to serve in the study, (2) to acquaint these 
observer- judges with the observation and rating instruments, (3) to 
measure their relevant value positions, (4) to establish estimates of 
their reliability as classroom observers, and (5) to identify variables 
that could be used in predicting the observation scores and ratings 
given a teacher by the observer- judges. 



Sub.j ects 

Subjects - Observer- Judges . The observer- judges , all volunteers, 
were student-teaching supervisors, education professors, and professors 
of academic subjects drawn from the faculties of four of the teacher 
training institutions 1 that participated in the study. These observer- 
judges (raters-^) recorded observationsr of filmed teaching episodes 
during the spring of 1964 and again a year later in the spring of 1965. 

A total of 130 subjects served as observer- judges for this phase. 

Sub;i ec ts - RAtees . The ratees were five experienced teachers 
whose teaching behavior had been filmed at the University of Wisconsin. 
These teachers all held master's degrees and had been selected as out- 
standing teachers. From the unedited films, 50 to 60 minutes in length, 
30-minute continuous segments were cut. Selection of the films and of 
the segments taken from them was made for purposes of achieving variety 
in teaching style, grade level, and subject matter taught. Film //I was 
of a ninth-grade French class; Film # 2, a seventh-grade mathematics 
class; Film #3, a fourth-grade unit on "Weather"; Film #4, a ninth- 
grade speech class; and Film $5, a seventh— grade science class. These 
five films were used (1) for the orientation of the observer- judges, 

(2) to gather data for reliability studies, and (3) to gather data for 
identification of variables for predicting observation and rating scores. 



Data Collection' Procedures 

Prior to - the viewing sessions the observer- judges completed the 
Study of Beliefs ; for each observer— judge data were gathered which gave 
a quantitative score for the measurement of personal and educational 



■^University of Wisconsin, Northwestern University, Sacramento 
State College, and State* University of New York at Albany. 



ERjt 









eaMHUMi 



beliefs. In addition, information regarding the sex, age, occupation, 
and institutional affiliation of each subject was obtained. 



During a six-weeks interval, film observation and recording 
sessions were held at each of the campuses of the four participating 
institutions. Conditions of the viewing sessions were similar. All 
observer- judges received the same 10-minute explanation, by the same 
person, for recording both their observations in the Teacher Practices 
Observation Record and their ratings on the Classroom Behavior Rating 
Scale and Teacher Evaluation Scale . During the viewing of Film #1, time 
was called in order for the observer- judges to become familiar with 
the observational procedures and instrumentation. This constituted 
the orientation provided the observers. As one of the major purposes 
of the study was to investigate observation and rating of teachers on 
the basis of a rater’s individual criteria, no attempt was made to 
bring the observer- judges to agreement concerning their recorded ob- 
servations, nor was any discussion to this effect permitted. For the 
other four films, no assistance of any kind was given the observer- 
judges. Each observer- judge, for each film viewed, completed the 
set of observation and rating instruments. 

Mass observations of films are expensive and administratively 
difficult to arrange. For these reasons, repeated observations the 
second year could be obtained on only two of the five films. Film #1 
was eliminated because it had been used as the orientation film and 
conditions of the first viewing could not be duplicated. Data obtained 
from the first viewing of Film #3 indicated a wide discrepancy in scores 
based on viewing locations, which could have been due to the artificial 
conditions under which it was filmed. Film #5 had not been observed at 
all four institutions. This left Films #2 and #4 for the second viewing. 
It was possible to obtain repeated TPOR scores on these two films from 
only a portion of those who observed the first viewings. 



Reliability 

Reliability Estimates 
Teacher Practices Observation Record 



Analysis of Data 

Means and standard deviations of observation scores were computed 
for each film for each viewing session. Means were examined to deter- 
mine if significant differences in TPOR scores were given at the four 
participating institutions or by the three major occupational classifi- 
cations of observer- judges . 

Data were submitted to analysis of variance to develop a between- 
observer reliability coefficient. The data were also used to develop 
statistical procedures for* establishing within-observer reliability 
estimates. Lastly, the data were' submitted to the Kuder-Richardson 
Formula 20 for measuring' item' reliability. 



12 



o 

KLC 









<■**<« 



R 






Findings - TPOR Scores 

Mean Scores . Table 1 shows the mean TPOR score given each of 
the five films by the observer-judges on the first viewing. The French 
teacher in Film #1 was seen as the least experimental and the fourth- 
grade teacher in Film #3 as the most in agreement with Dewey. The 
range of more than 40 points between the high and low TPOR means 
indicates the ability of the instrument to differentiate various styles 
of teaching. 



Table 1 

Mean TPOR Scores Given Five Films 
by All Observers 




Differences in the mean TPOR scores given at the four different 
participating institutions were examined. The location variable was 
found to have little or no influence. Using Scheffe’s comparisons, no 
statistically significant differences were found among the TPOR means 
given at the various locations for Films #1, #2, #4, and #5. The only 
statistically significant differences were found between California 
and each of the other three locations on Film #3. 

Differences in the mean TPOR scores given by the three major 
occupational classifications of observer- judges — college supervisors of 
student teaching, education professors, and academic professors — were 
also examined. No statistically significant differences were found 
between any of these groups for Films #1, #2, #4, and #5. The only 
statistically significant differences were found between supervisors 
of student teaching and both education and academic professors on 
Film #3. 

Scores for the two viewings of Films #2 and #4 were compared 
and are reported in Table 2. This shows a fairly substantial difference 
between TPOR means recorded for the first and second viewings of Film 
#2. While this difference raises some questions about stability, both 
means for this subgroup of 69 observers lie well within one standard 
deviation of the mean of 115.86 for 119 first-viewing observers which 



13 



o , 
ER AC 



*&■»**. 






may simply demonstrate the normal variability of TPOR scores. The dif- 
ferences between TPOR scores for the first and second viewings of Film 
#4 are very small. 



Table 2 

Mean TPOR Scores Given Films On 
Repeated Observations One Year Apart 



Film 


i Viewing 


No. of 
. Observers 


• Mean 


S.D. 


No. 2 


1st 


69 


122.22 


20.52 


No. 2 


2nd 


69 


109.81 


18.31 


No. 4 


1st 


72 


107.15 


17.15 


No. 4 


2nd 


72 


105.14 


18.12 



Reliability Coefficients . Reliability of instruments of measure- 
ment is a complex concept which becomes compounded when dealing with the 
measurement of classroom behavior by systematic observation. The ques- 
tion of the reliability of the observers and the recording of their 
observations must be added to the problem of instrument reliability. In 
the past most observational studies have limited their study of observer 
reliability to computing the correlation between two sets of observations 
or to figuring the percent of agreement between observers. Following 
this procedure, the correlations between the TPOR scores obtained from 
the repeated observations of Films #2 and #4 were computed and are re- 
ported in Table 3. The correlations of the columns (10-minute observa- 
tion periods) within each f* 1 n observation are very high, but the cor- 
relations between the 1964 and 1965 observations are very low. The 
first indicates that the observers tended to maintain the same relative 
position in the group throughout the viewing of a single film on a 
given day. The second indicates that sizable shifts in these positions 
took place during the intervening year. In other words there was good 
consistency within one occasion or viewing, and again within another, 
but poor stability between two widely separated occasions. One must 
keep in mind, however, that such reliability coefficients normally 
decline proportionately with the length of time between "tests." Had 
the repeat observations been made only a month or so apart considerably 
higher correlations might have been expected. 

Even so, correlation of two sets of scores by a number of dif- 
ferent observers is not likely to be a very accurate estimate of reli- 
ability. It is difficult to make arrangements for large numbers of 
observers to view the same classroom on two different occasions, or to 
control variations between those occasions. Likewise, the number of 



Table 3 



Correlation of TPOR Scores 
Obtained from Repeated Observations of Films 



FILM NO, 2 







1964 Observation 


1965 Observation 


TPOR 




TPOI 


1 Column 


TPOR Column 


Column 


1 


2 


3 


' TOT 


1 


2 


, 3 


TOT 


1964 


1 


1.00 


.79 


.69 


.89 


o 36 


.25 


.12 


.27 


Obser- 


2 


— 


1.00 


.81 


.95 


— 


.29 


.16 


.31 


vation 


3 


— 


— 


1.00 


.92 


— 


— 


,20 


.29 




TOT 


— 


— 


— . 


1.00 


— 


— 


— 


.32 


1965 


1 










1.00 


.61 


,55 


.80 


Obser- 


2 


— 


— 


— 


— 


— 


1.00 


.81 


.93 


vation 


3 










— 


— 


1.00 


.90 




TOT 


1 


\ 






— 


— 


— 


1.00 



FILM NO. 4 







1964 Observation 


1965 Observation 


TPOR 




TPOR Column 


TPOR Column 


Column 


1 


2 


3 


TOT 


1 


2 


3 


TOT 


1964 


1 


1.00 


; 75 


.52 


.86 


.32 


.36 


.25 


.34 


Obser- 


2 


— 


1.00 


.71 


.93 


— 


.46 


.52 


.52 


vation 


3 


— 


— 


1.00 


.85 


— 


— 


.67 


.65 




TOT 


— 


— 


— 


1.00 


— 


— 


— 


.57 


1965 


1 


— — 


— 


— 


— 


1.00 


.79 


.71 


.90 


Obser- 


2 


— 


— 


— 


— 


— 


1.00 


.83 


.95 


vation 


3 










— 


— 


1.00 


,92 




TOT 










— 


— 


— 


1.00 



classrooms observed on two different occasions by two different observers 
is likely to be small. In either case, the size of the N determines 
the precision of the correlation coefficient, and since the N of even 
well-financed observational studies rarely exceeds 100 the confidence 
intervals for the coefficients are extremely wide. Furthermore, such 
correlations are usually based on total scores which ignore variations 
in scoring individual items or categories. It is possible to obtain a 
perfect correlation of total scores when the reliability for the items 
is zero. If on a 70-item "sign" system, for example, the 35 odd- 
numbered items are marked "+" and the 35 even-numbered items are marked 
"0" on the first observation, and then exactly reversed on the second 
observation, identical total scores will be obtained and used to pro- 
duce a deceivingly perfect reliability correlation. 

Percent of agreement between observers tells almost nothing about 
the accuracy of the scores obtained. It is entirely possible to find 
observers agreeing 99 percent in recording behaviors on an instrument 
whose item or category consistency is very poor. Reliability can be 
low even though observer agreement is high for several reasons. For 



example, observers might be able to agree perfectly that a particular 
teaching practice occurred in a classroom, yet if that same practice 
occurs equally, or nearly so, in all classrooms, the reliability of that 
item as a measure of differences between teachers will be zero. Near- 
perfect agreement could also be reached about the percentage of time a 
number of teachers employed certain categories of behavior; but if every 
teacher sharply reversed these percentages from period-to-period or day- 
tor-day, the reliability of these categories would be zero. Errors 
arising from variations in behavior from one situation or occasion to 
another, can far outweigh errors arising from failure of two observers 
to agree exactly in their records of the same behavior. 



Yet, the reliability of most instruments for systematically 
recording the behavior of teachers requires a high percent of observer 
agreement. "Between-ob server" agreement has become almost a cardinal 
principle in planning observational studies « According to Medley and 
Mitzel a sample of classrooms from the population to be studied should 
be visited by trained recorders using the observational instrument in 
the same way it will be used in any subsequent study. In order to study 
the "objectivity" of the items, i.e., how closely observers agree in 
recording identical behaviors, at least two recorders should be present 
on each: visit , sitting in different parts of the room and making inde*- 
pendent records. In order to be able to estimate how stable the two 
records based on different visits will agree, each class should be 
visited at least twice. To recapitulate, in their words, "c, teachers 
are. visited in s. situations by a team of r_ recorders. In studying 
the reliability of a scale w^th 1 items on it, the total number of scores 
to be analyzed will be cris . " 



To match this rigorous plan for data collection Medley and Mitzel 



have taken the classic definition of reliability. 



'xx 



'X 



and applied it to measurements of classroom behavior. In this defini- 



tion, true variation, 



0 , 



is defined to be the variation of the 



total score for any class (teacher) when the effects of recorders 
(observers), items on the scoring instrument, and situations (viewings 
or visits) have been removed. The true variation plus "error," 0 X 2 
is defined to be the variation of the total scores for any class, 
including variation contributed by items on the scoring instrument, 
recorders, situations and random error. The smaller the effect of the 
recorders , items , and situations for a class total, the higher the 
reliability coefficient will be. In other words, if the instrument 
has high reliability, the scoring of the class or teacher is relatively 
free of the effects of recorders, items, or the different situations 
under which the scoring was done, and as such, reflects a "good" or 



2 Donald M. Medley and Harold E. Mitzel, "Measuring Classroom 
Behavior by* Systematic 1 Observation:" Handbook of Research on Teaching , 
N. L. Gage (ed.) (Chicago: Rand’ McNally & Co., 1963), p. 309. 







m 



16 






mm > « ■ i n’mtmrn di fp nff 













»mp* 



reliable instrument. 3 

In seeking a design for estimating the reliability of TPOR 
observations , we closely examined the four-way analysis of variance 
model suggested by Medley and Mitzel. While it was found to be a 
sound approach to reliability estimation, it may not be entirely 
appropriate for analyzing the data obtained in the film study described 
above. For instance, in the simple example given by Medley and Mitzel 
in the Handbook , of Research on Teaching , page 316, where one item is 
used to score 24 classes (teachers) observed during four situations by 
two recorders (observers), the reliability coefficient is estimated by: 

„ MS 

Pxx = 1 “ cxr 
MS 

c 

where ^ 3 cxr is the mean square for classes x recorders obtained 
from the analysis of variance table and MS C i s the mean square for 
classes obtained from the analysis of variance table. The coefficient 
of reliability in this case actually reflects not instrument reliability , 
but rather, recorder or observer reliability . When ^>cxr is large, it 
indicates an inconsistency on the part of the observers to score the 
classes in the same way, which in turn causes P xx to be small. In 
like manner, a very small value of MS cxr reflects consistency in 
scoring, in which case will be large. 

Training of the observers undoubtedly would bring them into 
agreement with respect to recording or scoring identical behaviors, which 
would be reflected in a higher reliability coefficient, P xx . However, 
in the previously described film study in which the TPOR was tried out, 
no attempt was made to train the. observers. To the contrary, a delib- 
erate attempt was made to preserve the differences among observers by 
. selecting them from varying occupational groups , from varying sizes of 
institutions with varying orientations to teacher education, and from 
varying parts of the country. We wanted to test the reliability of 
the TPOR under uncontrolled field conditions to see what value it might 
have in the hands of the differing kinds of people who carry out the 
everyday responsibilities for teacher education in America. Hence, 
the component of variance due to the observers' variability in this 
study would cause 0 X to be large compared to resulting in a 

small P . There was not as much observer variability as might have 
been expected, however. When the Medley-Mitzel model was adapted to 
fit the film study data the TPOR observations were found to have a 
modest but substantial reliability coefficient of .57. 

In the analysis of variance example cited above it should also 
be noted that two of the variables of interest, viz., classes and situa- 
••• tions, had but one degree. of freedom each. This being the case, "poor" 
estimates of the components of variance could result . In fact, the 
components of variance could be estimated to be zero (which happens in 
many cases). Also, since the estimate of P xx would consist of the 



3 Ibid. 



o 

FRir 






Vfc-vi yiii, .v", (■ 



17 



m 



m 



m 



***** w*m 



ratio. of linear combinations of mean squares, the bounds of error on 
this estimate could be exceedingly large. 

The unsuitability of the Medley-Mitzel model for this data results 
primarily, however, from the fact that it stresses "between-observer" 
variability rather than "within-observer" variability. This is a 
philosophical rather than a statistical issue. Reliability coefficients 
which reward high agreement between observers imply that one should seek 
a single, uniform, "objective" system for observing and classifying 
teaching, behavior. From the point of view of the framework underlying 
the development of the TPOR, objectivity in perceiving and quantifying 
such behavior is neither possible nor desirable. "Between-observer" 
agreement may not only encourage a false sense of confidence with 
respect to the accuracy of measurements, but also gives a fal°e sense 
of "objectivity" regarding the observations. A team of observers can be 
trained to the point cf near-perfect agreement, but this does not erase 
the possibility that instead of several differing "subjective" judgments, 
they now make only one. Therefore, another mathematical definition of 
reliability was sought, one which is concerned primarily with "within- 
observer" variability. 

It was reasoned that if having scored a given filmed teaching 
situation, the same observer-judge were to score the same teaching 
situation again in the same way, then it could be said the observer- 
judge’s scoring was reliable. Hence, a definition for "within— observer" 
reliability for a given observer- judge and film was devised as follows: 



Items 


1 


Viewing 

2 




1 


X 11 


X 
to 
H 1 


d i 


2 


X 

t— 1 
to 


x 22 


d 2 


3 


x 13 


x 23 


d 3 


• 


• 


• 


• 


• 


9 


• 


9 


• 


9 


• 


• 


n 


X ln 


X 2n 


d n 


variances 


of the i 


differences 


d. 

1 






d ± = x n ' 


CM 

X 



x li 



- x 



2i 



If the scores are independent, i.e., 
fact marks by chance, then 



the judge is not consistent, or in 



V(d ± ) = V(K U - x 2i ) 

» V(x 1:L + v (x 2i ) 

« a 2 + a 2 

■ 2a 2 (or 2 Var(x)) 



However, if the judge is consistent from viewing to viewing, his 2 
scores should be positively correlated and now 



or 



VCd^ = V(x u - x 2i ) 

= v (x u ) + V(x 2i > - 2 Cov (x 1;L , x 2i ) 
= 2a 2 - 2a 12 
V(d ± ) = a* =* 2a 2 - 2 o 12 



It is noted that the following assumptions are made in the above 
discussion: 



1) The. variance of each item score is the same for all items 
over viewings ; i . e . , 

V(x. .) = a 2 for i 88 1,2 
13 j ~ l...n 



2) Under the complete randomness assumed under chance scoring, 
feach value of x is assumed to have equal chance of being 
selected; hence 

P(X) = 1 

k 

where k is the number of choices available. 



Now we define for judge 



m 

3 



and film f, 



P 



jf 



- 1 




2a 2 



where 



a d 2 = Var (d^ 



i = 1. . .n 



a 2 - Var (x. .) 

13 



i = 1,2 
j = l...n 



However , under the assumptions of a random choice by the judge, a 2 
becomes a constant, computed as 



a 2 - l (x - y) 2 p (x) 
x 



19 



We calculate the sample value of s^ 2 and use it to estimate a^ 2 . 
Hence we are working with a statistic 



r 




jf 



2 a 



Now, if there is in fact high positive correlation of the scoring 
from viewing 1 to viewing 2, then 

s^ 2 will be small (i.e., s^ will be large) 



If the scoring from viewing to viewing is in fact independent 
and really associated with a chance event, then 



The coefficient r will theoretically be in the interval (0,1) 
where a maximum ,'alue of ^one implied absolute correlation, while a mini— 
mum value of zero implies the same scoring could have happened by chance, 
hence no reliability. However, the possibility of r^ f<0 exists because 
there is a non-zero probability that the scorings will be negatively 
correlated and this may cause s d 2 to be greater than a 2 ; this in turn 
causing rj£ <0. 

Worth mentioning is the fact that this statistic uses a larger 
than expected variance a 2 , as a yardstick against which the judge's 
variation from viewing to viewing is compared. This is because one would 
expect a judge to select the extremes in scoring an item less frequently 
than scores near the center of the scale; such scoring would likely 
yd£ld a variance smaller than that implied by a completely random selec- 
tion. This yardstick could, in effect, cause the coefficient r ^ to 
be depressed as compared with other measures of reliability. 

Using the above formulation the "within-observer" reliability of 
TPOR scores was computed for the two filmed teaching situations on which 
repeated viewings were made a year apart. Table 4 shows eight reli- 
ability coefficients ranging between .48 and .62. 

These coefficients of reliability reflect observer reliability 
rather than instrument reliability. In order to determine the internal 
consistency of the TPOR, its item reliability, which would indicate 
something of its potential in the hands of reliable observers, the 
film study data were submitted: to Kuder-Richardson Formula 20 for 
estimating item reliability. Table 5' shows these results. 



and 




s^ 2 will be of the magnitude of 2a 2 (i.e 
close to zero) 



, s will be small; 
1 2 



and will be close to 0. 






W99A ww m 1 ■ m mm i mm 






Table 4 

"Within-Observer" Reliability Coefficients for 
TPOR Scores on Repeated Viewings of Films 





FILM NO. 2 






N = 69 




TPOR 






Column 


hi 


error 


TOT 


.48 


.0255 


1 


.57 


.0177 


2 


.51 


.0194 


3 


.51 


.0177 




FILM NO. 4 






N = 72 




TPOR 






Column 


r jf 


error 


TOT 


.52 


.0191 


1 


.56 


.0182 


2 


.57 


.0244 


3 


.62 


.0171 



Table 5 



TPOR Internal Consistency Reliability Coefficients 





TPOR Columns 


Film 


Viewing 


N 


1 


2 


3 


TOT 


No. 1 


1st 


158 


_ 






.86 


No. 2 


1st 


69 


.79 


.81 


.83 


.93 


No. 2 


2nd 


69 


.77 


.81 


.79 


.91 


No. 3 


1st 


140 


— 


— 


— 


.93 


No. 4 


1st 


72 


.76 


.77 


.78 


.90 


No. 4 


2nd 


72 


.76 


.78 


.77 


.91 


No. 5 


1st 


84 






■MM 


.85 



21 






H 






mmmm . 









If each item is highly correlated with every other item on the 
instrument, then the instrument has good item reliability or internal 
consistency. The fact that the TPOR scores yielded uniformly high 
internal reliability coefficients is not surprising in light of the 
fact that throughout their development the TPOR, TPI, and PBI underwent 
repeated RAVE analysis, an iterative procedure which yields a set of 
item response weights which maximize the internal consistency of 
inventories.^ 



TPOR Reliability Summary 

Having submitted this instrument to the hazards of uncontrolled 
use by uncontrolled observers, and then submitting it to the severest 
statistical procedures that could be found, w& can make the following 
conclusions as to reliability estimates for the Teacher Practices Obser- 
vation Record : (1) correlation of observers' total scores within a 

given film viewing — VERY GOOD, (2) correlation of observers' total 
scores between repeat film viewings one year apart — POOR to FAIR, (3) 
between-observer reliability — FAIR, (4) within-ob server reliability — 
FAIR, (5) internal consistency reliability — VERY GOOD. 



Reliability Estimates 
Classroom Behavior Rating Scale 



The Classroom Behavior Rating Scale is an instrument used in this 
study to rate the behavioral characteristics of teachers and their 
students on an authoritarian-egalitarian dimension. The scale consists 
of thirteen items which describe teacher characteristics, Classroom 
B ehavior Rating Scale-Teacher (CBRST) and four items which describe 
pupil characteristics, Classroom Behavior Rating Scale-Pupil (CBRSP) . 
Each item is scored by the observer on a six-point continuum; the 
higher the score, the more authoritarian the behavior observed. A 
maximum score of 102 indicates extreme authoritarian behavior, and a 
minimum score of 17 indicates non-authoritarian behavior. 



Analysis of Data 



Means and standard deviations of rating scores were computed for 
each film. Means were examined to determine if significant differences 
in CBRS scores were given at the four participating institutions or by 



^Ronald Ragsdale and Frank B. Baker, The Method of Reciprocal 
Averages for Scaling of Inventories and Questionnaires : A Computer 
Program for the CPC 1604 Computer , (Mimeographed, Laboratory of Experi- 
mental Design, Department of Educational Psychology, U. of Wis., Madison). 

^Description of instrument can be found in Chapter II, pi. 8. 



o 



22 



the three major occupational classifications of observer- judges . 



Through the use of the same procedures developed for estimating 
within-observer reliability of the Teacher Practices Observation Record 
(reported in the previous section of this report) within-observer 
reliability coefficients were developed. Data were also submitted to 
the Kuder-Richardson Formula 20 to establish internal-consistency 
measures for the instrument. 



Findings - CBRS Scores 

Mean Scores . Table 6 shows the mean Classroom Behavior Rating 
Scale scores given each of the five films by the observer- judges on the 
first viewing. 



Table 6 

Mean CBRS Scores for Films 



Film 


No. of 
Observers 


Mean 


S.D. 


No. 1 


130 


47.80 


11.68 


No. 2 


124 


37.21 


9.94 


No. 3 


119 


35.94 


11.64 


No. 4 


119 


41.70 


14.36 


No. 5 


67 


42.54 


11.73 



The French teacher in Film #1 received the highest score,, hence 
was seen as the most authoritarian, while the fourth-grade teacher in 
Film #3 was seen as the least authoritarian by virtue of receiving the 
lowest mean score. The range of slightly less than twelve points between 
the high and low CBRS means indicates the rather limited ability of the 
instrument to differentiate between teachers. 

The CBRS scores given at the four participating institutions 
were examined for differences and it was determined that the location 
variable had little or no influence on mean scores. No statistically 
significant differences were found among the CBRS means given at the 
various locations in Films 1, 2, 4, and 5. The only differences having 
statistical significance were between California and each of the other 
three locations on Film #3. 

CBRS scores were also examined for differences with respect to 
the three major occupational classifications of the ob server- judges-^- 
clinical supervisors, education professors, and academic professors. 
Again, no statistically significant differences were found for Films 
1, 2, 4, and 5 between groups of observer- judges. In Film # 3 signifi- 
cant differences were found between the clinical supervisors and both 



0 



23 






academic and education professors. These differences in mean CBRS 
scores show a similarity to the statistically significant differences 
found for the TPOR scores. 

Reliability Coefficients . The Classroom Behavior Rating^Scale 
is an adaptation of an instrument developed by Ryans and McGee,® who 
have reported between-observer reliability coefficients for trained 
observers. As the purpose of this study was to use untrained observer- 
judges who would be participating in the study over periods of twelve 
to thirty-six months, and who were deliberately prevented from develop- 
ing criteria which would enable them to increase their agreement, it 
was determined that within-ob server reliability coefficients rather than 
between-observer coefficients would be a much more reasonable reli- 
ability estimate for this instrument. As observer- judges would be 
recording behavior over a relatively long period of time and their 
perceptual differences in observing and rating teacher behavior were 
encouraged rather than "trained out," the consistency of an observer- 
judge's ratings over time seemed to be of most importance in establishing 
observer reliability. 

Table 7 reports the with in-ob server reliability coefficients 
computed for the two films which had been observed a year apart. These 
coefficients were determined for the two sections of the instrument, 
teacher characteristics (CBRST) and pupil characteristics (CBRSP) . 



Table 7 

"Within-Observer" Reliability Coefficients for 
CBRS. Scores on Repeated Viewings of Films 





FILM NO. 2 






N = 69 






r if 


error 


CBRST 


.86 


.0191 


CBRSP 


.84 


.0067 




FILM NO. 4 






N = 72 






fjf 


error 


CBRST 


.79 


.0562 


CBRSP 


.83 


.0208 



^Ryans and McGee (for reference see page !8. of this report). 



24 









Within-observer reliability coefficients for the Classroom 
Behavior Rating Scale range from .79 to .86. 

In addition to consideration of observer reliabilty, the in- 
ternal consistency of the CBRS was examined. As some of the items of 
the CBRS had been altered or changed in adapting it for use in this 
study, an analysis of its item reliability was deemed important in 
order that confidence could be placed in its use. Table 8 shows the 
results of the CBRS data when submitted to the Kuder-Richardson Formula 
20 for estimating internal consistency. 



Table 8 

CBRST and CBRSP Internal Consistency 
Reliability Coefficients 




The item reliability for the CBRST ranges from .71 to .85; item 
reliability for the CBRSP from .30 to .57. 



CBRS Reliability Summary 

The Classroom Behavior Rating Scale does not indicate from the 
film data that it is a strong discriminator between behavior of teachers. 
As it consists of only seventeen items, each of which is scored on a 
six-point continuum by the observer at the end of a thirty-minute obser- 
vation period, it tends to measure general impressions of the teacher 
by the observer rather than discrete teaching behaviors. For this very 
reason it tends to enjoy good within-observer reliability. The observer 
would seem to respond to the same teaching behavior as a general per- 
ceptual set in much the same manner over a period of time. The CBRST 
also has good item reliability; the CBRSP item reliability coefficient 
is, of course, influenced by the very small number of items which com- 
prise this section of the instrument. However, from the film study 
data we can conclude that the CBRS enjoys good within-observer reli- 
ability and adequate internal consistency. 



0 



25 



Comparison of the Teacher Practices Observation Record and 
The' Classroom Behavior Rating Scale 



The Teacher Practices Observation Record is an instrument for 
systematically describing the classroom behavior in terms of agreement- 
disagreement with John Dewey’s Experimentalism. The Classroom Behavior 
Rating Scale is an instrument used to rate the behavioral characteristics 
of teachers on an authoritarian-egalitarian dimension. From the data of 
the film study some comparisons can be made of the two instruments. 

Figure 1 shows the relationship of the TPOR and CBRS mean scores 
for the same filmed teaching episodes. 



Figure 1 

Relationship of TPOR and CBRS Mean Scores 




Data from the five film viewings show the inverse consistency in 
movement and direction of the mean scores on the two instruments. A 
high TPOR score (agreement with experimentalism) reflects a low CBRS 
rating, (nonauthoritarian behavior characteristics) . Thus there seems 
to be a relationship between experimental behavior and nonauthoritarian 
behavior characteristics of a teacher. The observer- judges tended to 



26 



mmmfm 







see teachers who are in agreement with experimentalism as nonauthori^- 
tarian in behavior and vice versa. 

When TPOR means are examined in relation to the evaluative judg- 
ments (Teacher Evaluation Scale scores) made about the quality of teach- 
ing observed in the films by the ob server- j udges , an interesting pattern 
of relationship between TPOR scores and evaluations appears. Table 9 
reports these data. While this could mean that the TPOR scores were in- 
fluenced by how much the observer liked what he saw, the converse is 
more likely true. The wide differences in TPOR means within each of 
the evaluative categories are evidence that the relationship between 
TPOR scores and ratings is relative within the limits describing in- 
dividual film. In this study, a given TPOR score did not guarantee 
a "good" or "bad" rating, even though in every case the higher the 
rating, the higher the TPOR mean score. 



Table 9 

The Relationship Between TPOR Means and Evaluative 
Ratings of Five Filmed Teaching Episodes 



Evalua t ive Ratings 





A 


B 


c 


D 


E 


F 


Film 


Out- 

standing 


Very 

Good 


Good 


Fair 


Poor 


In- 

competent 


No,. 1 


88.64 


J 

82.21 


79.45 


67.89 








(11) 


(48) 


(33) 


(-9) 


- CQ) 


(0) 


No. 2 


126.47 


118.57 


109.19 










(19) 


(56) 


. 121) 


(0) 


_ (0) 


(0) 


No. 3 


138.19 


119.32 


109.91 


85.00 








(27) 


(38) 


( 23)__ 


_ _ Cl) 


. (0) 


(0) -- 


No. 4 


110.69 


106.86 


93.73 


76.50 


75.50 






(29) 


(43) 


_ (ID.. 


_ (D _ 


(2) 


(0) _ 


No. 5 


115.56 


103.39 


96.00 


88.67 


65.00 






(9) 


113). _ _ 


(23) 


(6) 


(1) 


(0) 



Statistically significant differences beyond the .05 level (using 
Scheffd’s comparison procedures) were found for the following pairs of 
means : 

Evaluative Category A: Films 1 and 2, 1 and 3, 3 and 4, (1 and 4, 

1 and 5 were very close) 

Evaluative Category B: Films 1 and 2, 1 and 3, 1 and 4, 1 and 5 
Evaluative Category C: Films 1 and 2, 1 and 3 (1 and 5 were close) 
Film r No. 3; Category A and B, A and C 




27 









*zmmm 



**m*nn*rr mm 



*mr * 



W. 



Table 10 shows the correlations between CBRS scores and evaluations 
given each film. The relationship between CBRS scores and evaluations 
is relative within limits describing each individual film as evidenced 
by the rather wide differences in CBRS means within each of the evalua- 
tive categories. However, the higher the evaluation, the lower the 
CBRS mean score; the more authoritarian the teacher is seen, the lower 
his rating. 



Table 10 



CBRS Means and Evaluative Ratings 





A 


B 


C 


D 


E 


F 


Film 


Out- 

standing 


Very 

Good 


Good 


Fair 


Poor 


In- 

competent 


No. 1 


T ' 

38.83 


43.00 


p ■■ ■ ■“ *7 

53.43 


67.11 


0.0 


1 

0.0 




(11) 


(48) 


(33) 


(9) 


CO) 


. _ _ CO) 


No. 2 


29.47 


36.96 


43.90 


0.0 


0.0 


0.0 




(19) 


_ (56) 


(21) 


__ (0) 


(0)_ 


_ (0) 


No. 3 


26.74 


35.87 


46.39 


57.00 


0.0 


0.0 




(27) 


(38) ___ 


(23) 


(1) 


_ (01 . 


CO) . 


No. 4 


32.21 


41.81 


54.45 


67.50 


70.00 


0.0 




(29) _ _ 


(43) 


(ID __ 


(2) _ 


. (2)_ 


_ (0) _ 


No. 5 


24.67 


36.46 


48.22 


59.83 


68.00 


0.0 




(9) 


—03) 


(23) 


(6) 


L1L_ 


(0) 



Thus, for both the TPOR and CBRS there is a direct relationship 
between mean scores and evaluative ratings. Within each film the more 
the teacher is seen in agreement with experimentalism and nonauthori- 
tarianism in behavior, the higher the evaluative rating he receives. 



Identification of Predictor Variables 



Teacher Practices Observation Record 



Analysis of Data . One of the basic purposes of Phase I was to 
identify the variables which would predict the observation score given 
a teacher by an observer- judge. Using the Teacher Practices Observa - 
tion Record scores as responses, multiple regression models were 
developed to isolate important variables that would be useful in ex- 
plaining an observation score given by an observer based on information 
available about the observer and the filmed teaching situation he ob- 
served. Thus, an attempt was made to describe the score given by an 



H 






observer as a linear function of the variables describing the observer 
and the filmed teaching episode. A regression line was fitted by 
method of least squares using the observer’s score as a function of 
these descriptive variables. 

Table 11 is a description of the variables considered in this 
investigation. 

Table 11 



Definition of Variables 



Name of 
Variable 


Statistical 

Variable 


Definition 


Response 


*1 


TPOR score 


Films 


X 1 

x 2 

x 3 

x 4 


1 if film # 2, 0 otherwise (film #1 in 
1 if film #3, 0 otherwise base line) 

1 if film #4, 0 otherwise 
1 if film #5, 0 otherwise 


Belief 

Scores 


x 5 

x 6 

x 7 


Personal Beliefs Inventory 
Teacher Practices Inventory 
Personal Opinion Questionnaire 


Subject 
Matter 
Field of 
Observer 


x 8 

x 9 

x 10 

X 11 

x 20 


1 if Soc. St., 0 otherwise (elementary 
1 if Nat. Sci. , 0 otherwise ^ base 

1 if Math, 0 otherwise line) 

1 if Eng. or For. Lang., 0 otherwise 
1 if Generalist, 0 otherwise 


Occupational 
Classification 
of Observer v 


x 12 

x 13 

x 14 


1 if Methods Prof., 0 otherwise (clinical 
1 if Education Prof., 0 otherwise sup. in 
1 if Academician, 0 otherwise baseKne) 


Sex 


x 15 


1 if female, 0 if male 


Age 


x 16 


chronological age 


Institution 


x 17 

x 18 

x 19 


1 if Northwestern, 0 otherwise (U. of Wis, 
1 if Albany, 0 otherwise * n base 

1 if Sacramento, 0 otherwise line) 




] 



29 



Since the most complex model encountered in this investigation 
is that for the Teacher Practices Observation Record score, this 
model will be examined in its symbolic form to clarify its meaning and 
its use. The B's (betas) represent numerical coefficients, the x's 
the independent variables and y, the response or score. The model is 



y ■ B o 



1 i T 
*17 x 17 
+B 12 x 12 



; x 2 + B 3 x 3 + x 4 


Films 


B 18 X 18 + B 19 X 19 


Institution 


: + B 13 x 13 + B 14 x 14 


Occupation 


x 5 + B 6 x 6 + B 7 x 7 


Belief 

Scores 



Now x-^, X 2 » Xg> and x^ describe the films; x^y, x^g, and x^ describe 
the institution; and x^ 2 » x-^g, and x-^ describe the occupation of the 
judge. Each of these qualitative variables takes on the value one if 
the variable describes the judge or the film of interest, and the value 
zero otherwise. Quantitative variables x^, Xg, and Xy represent the 
judge's belief scores and take on the values of the scores for a given 
judge. 



Suppose that a clinical supervisor from the University of Wis- 
consin views film one. To predict his rating, we let all the qualita- 
tive variables equal zero, which results in the prediction equation 

y = B 0 + % x 5 + B 6 x 6 + B 7 x ? . 

Hence, using the judge's belief scores (x^, Xg, xy) we would have a 
predicted value, y, for the judge's rating (within bounds of error). 

Now, suppose an academician from Albany views film three. To 
predict his rating, we let X 2 = 1, x^ = 1, and x,g = 1, setting all 
the other qualitative variables equal to zero. The prediction equa- 
tion now becomes 

y = (B 0 + B 2 + B^ + B 18 ) + B 5 x 5 + B 6 x 6 + B ? x y 

which will yield a predicted value, y, of the judge's rating when the 
belief scores are substituted for x^, Xg, and Xy (within bounds of 

With this prediction equation, ^ one can predict the rating given 
any film by a judge, provided one knows his belief scores, his academic 
position, and the university he represents. 



7 

William Mendenhall, An Introduction to Linear Models and the 
Design and Analyses of Experiments (Belmont, California: Wadsworth 
Publishing Co., Inc., 1966). 



0 



30 



i 



In the model search, a detailed model (interaction) was first 
considered for the response ( Teacher Practices Observation Record 
score) and investigated. It was found that including the interaction 
terms did not increase the predictive value of the model. See Table 
12 . 



A simpler model (main effect model) was investigated next in an 
attempt to isolate those variables most useful in predicting an ob- 
server’s score. A stepwise regression was performed, using the variables 
films, scores, sex of the observer, age, subject matter field of the 
observer, occupational classification, and institutional affiliation. 

At any given step in this regression program, the reduction in the 
total variation of the ratings accounted for by regression on those 
variables entered into the regression is computed and tabulated. As 
each additional variable is entered, the additional reduction in the 
total variation is computed and tested for statistical significance. 
Typically, the order in which the variables are entered into the 
regression becomes an important factor and hence five different orderings 
considered by the experimenter to be of importance were investigated. 

In every case, the orderings yielded the same variables as being im- 
portant. Two orderings will be reported here, namely, the ordering in 
which the variable ‘'films” is entered first, and the ordering in which 
"films” is entered last so that one can see the strong overriding in- 
fluence that "films” has upon a judge’s rating. 

Findings - TPOR Scores . In the investigation of the TPOR scores 
the variables found to contribute to the accuracy of prediction were 
films, institutions, occupational classification, and belief scores. 
Tables 13 and 14 give one a basis for making this decision. It is 
well to note the statistical significance of the variable films regard- 
less of the order in which it enters, the model. The final regression 
equation fitted to those variables considered most important in pre- 
diction is given in Table 15, together with the Analysis of Variance 
and Summary Table. The symbol R z represents that fractional part of 
the total variability of the judges’ ratings accounted for by regres- 
sion. It is interesting to note that having fitted only the variable 
films, R^ is .4330. Adding the other 3 variables (institutions, occupa- 
tions, and belief scores) R^ increases to .4799, showing that the, other 

variables accounted for only 5% more of the variability. \ 

\ 

Summary - TPOR Multiple Regression Models . Forty-three percent 
of the variance in the Teacher Practices Observation Record scores is 
accounted for by the filmed teaching episodes; the differences between 
the filmed teacher behavior accounts for this much of the variance ^.n 
the scores recorded by the observers. \ 

With the addition of the other variables in this model, (belief 

scores, occupation and institutions of the observers) the total \ 

\ 

\ 



31 



Interaction Model - TPOR 



3 


CM 


CM 


r>. 






vO 






CA 


03 


*»• 










(0 




O 


in 


0 


0 


5 


0 




• 


• 


• 


• 


• 


• 


• 




. \D 


mum 


«w 


PM 


pH 


m*m 


mm* 


u. 


CM 






V 


V 




V 




4 f 


O 


r*x 


in 


00 


Ah 


r«x. 






<X\ 


VO 


0 


mmm 


Jt 






CM 


VO 


in 


vo 


cn 


cn 






• 


• 


• 


• 


• 


* 


• 


ml cm 


P**H 


O 


CA 


cn 


cn 


CM 


CM 


v> • 


00 


CO 


Ah 


Ah 


00 


CO 


00 


V>|M- 


CM 


CM 


CM 


CM 


CM 


CM 


CM 






0 ) 

*o 



pi 

C 4 J 
V 

c 
o 



CM 



CC 

to 

in 



V) 


c 


O O 


oc 




1 tn 


JO 
in 10 


tn 


«M — 




in t- ! 


! 


01 <0 


i 


<M > 


j 


1 




IIh 4- 


i 


0 




H- 


1 * 


0 01 


V 

1 


u 


i 4- 


>* C 
t <0 


: '• 

TJ 


<0 0 




i £ 

3 — 

in c 





CM 



4 » 

TJ 



%n 



cC 

m 

tn 

i 

CM 

OC 

in 

in 



$ 



CM 

tr\ 



s 

VO 

3 

cn 



u> ~~ 
tA tA 



Ah 

o 

iA 



0 > CM 



O O 

3 3 3 



0 


cn 


pmm 




O 


Ah 


e*** 


vo 


r^. 


3 


r»- 


Jt 


Ah 


*"■* 


vo 


00 


vO 


• 


• 


• 


• 


• 


• 


cn 


-tf 


-Cf 


PH 


o> 


cn 


O 


CM 


Ah 


O 


cr\ 


vo 


cn 




CM 


CM 


cn 


CM 



CA 

CM 



Ah 



vO 


CM 




- 3 - 


-tf 


00 


CM 


^ - 
4 > 


cn 


CM 


in 


CO 


f»* 


r>. 


mmm 




cm 


IA 


vo 


vo 


0 


J- 






• 


• 


• 


• 


• 


• 


• 


«— tA 


Jfr 


3 


vO 


00 


cr 


ov 


cn 


O O 


rM 


cn 


CT\ 


cn 


cn 


Ah 


• • 


u> 


vO 


vo 


O 


vo 


m 


vO 




fl 


* 


* 


A 


* 


* 


* 


•W 


cn 

cn 


cn 


p — 


*— 






CM 


<0 r 



in 

0) 



* 


in 

4J 


U 

O 

U 

to 

0) 

♦hi 


X 


V 


Matter & 
let Scores 


in 


O 


0) 


(U 


cn 


.0) 


a* 


V> 


< 


• *mm 


v 


4h 


X 


X 


X 


*r» o) 

JQ m 


JQ *3 
<0 0) 


. Ul 


in 


sn 


V) 


3 ® 
«/> X 




c 


E 


E 


E 




U -M 


•«w 








x a> 


10 |M 


40 








cn 


> u. 


X 


u. 


Uh 


U. 


Uu < 



u 

Q> 



<0 



JO 

3 

tn 



0) 

cn 

< 



m 

c 

o 



u 

<0 

u 

0 > 



0) 

U 

c 

y 



c 

cn 

in 

in 

4> 

10 

y : 

"D 

C 



$ * 



32 



o 

ERIC 









iiiiaiiiiiaiaM 



Table 15 



Final Model for TPOR 
Equation 

y± = 90.8766 + 35.6526 x ± + 40.9784 x £ + 23.5528 x 3 + 21.5150 x 4 
+ 2.1853 x + 3.0010 x lg + 10.8685 x 19 

- 8.1203 x 12 - 3.5617 x 13 - 1.6493 x 14 

- 0.1020 x 5 + 0.0514 x 6 - 0.0326 x y 



Analysis of Variance 



Source 


df 


ss 


MS 


F 


Regression 


13 


j — 

138,128.961 


10,625.305 


1 

37.832** 


Error 


533 


149,695.113 


280.854 




s = 16.7587 











**denotes significance at .01 level 



Summary 



Variables 


CM 

Pi 


Inc. in R^ 


Films 


i 

.4330 


! 

.4330 


Films , Institution 


.4615 


.0285 


Films, Institution & 


.4705 


.0090 


Occupation 






Films, Institution, 


.4799 


.0094 


Occupation, & Belief 






Scores 







35 






erJc 






i 



'■'*?**» 












y 






wp 



percentage of variance accounted for is forty-eight percent. This in- 
dicates the additional variables add only five percent to the total 
amount of variability that can be identified. Thus, fifth-two percent 
of the variance from observation scores is unaccounted for and must be 
explained by chance, error or unmeasured variables. 

The additional variables that were included in the models (age, 
sex, and subject matter field of the observer) seemed to have no statis- 
tically significant influence on observer scores, at least in this situa- 
tion. 



From the analysis of the TPOR scores in the first phase of the 
study it can be tentatively concluded that there are many powerful 
factors affecting the perceptions of the observers in recording class- 
room behavior that have not been identified. However, the data indicated 
that the Teacher Practices Observation Record has substantial power for 
distinguishing differences among the five filmed teachers. 



Classroom Behavior Rating Scale 



Analysis of Data . The data from the first phase of the study was 
also analyzed to predict the score of the Classroom Behavior Rating 
Scale given the filmed teacher behavior by the observer-judge. The 
same statistical procedures (multiple regression models) were used 
to identify predictor variables of the CBRS scores as were used for the 
Teacher Practices Observation Record scores (see page 19). The variables 
considered for this investigation were the same as those considered for 
the TPOR and are reported in Table 11. 

The Classroom Behavior Rating Scale consists of seventeen items 
which describe teacher and pupil behavioral characteristics. Thirteen 
of the items describe teacher characteristics, Classroom Behavior Rating 
Scale-Teacher (CBRST) and the remaining four items describe pupil charac- 
teristics, ^lass^oom Behavior Rating JScale^Pup^ (CBRSP) . For the pur- 
poses of this analysis the CBRST and CBRSP scores were first considered 
independently and then combined. Thus, thr^e responses — CBRSP, CBRST, 
and CBRS — were used to isolate predictor variables. 

In the model search an interaction model was first considered for 
the three responses. It was found in each case that including the 
interaction terms did not increase the predictive value of the models. 

See Table 16. 

Next, main effect models were investigated to identify those 
variab les most; useful’ in predicting an observer's rating. Again step- 
wise regressions were performed using the variables of films, belief 
scores, sex, age, and subject matter field of the observer, occupational 
classification and institutional classification. 

Findings - CBRS Rating Scores . In investigating the CBRSP 
ratings, the variables films, belief scores, and institutions were 
found to be most useful in the prediction of the observer's rating. 

Tables 17 and 18 display the results of the two orderings, again 





36 



Table 16 









fVTr» **- ;.-«■>- 



K 

I 

f 




$ * 



37 

o 

ERIC 



cc 



indicates significance at .01 level 



Summary of F~tests to Determine 
Significance of Variables in the Model 






■pwf - y *m* p** m mmm. 




4 > 

> 

O 




38 










KWBtWf 



! 



» 



f 

i 



* 




o 

> 

o 



o 



■M 

(U 

o 

u 

c 

<0 

o 

4 - 

c 

cn 

v» 

V) 

<u 

*-> 

o 

c 

o 

X) 

t 



39 



..«5wk^ . - ^vr' Tiri f i mn-^ti i rrtrti i . ah.' . Ywr-rr M ir 'nft ii i-^iiarivi ii ir^r rm n t •» 




4 



II 






indicating that films are perhaps the most important variable to be 
considered. The final fitted regression equation using only films, 
belief scores, and positions is given in Table 19, together with an 
A.O.V. and Summary Table. Considering only the variable films, r 2 = 
.0978, whereas including the variables belief scores and positions 
increases r 2 to .1439 to further point out the dominance of the films 
as a predictor variable. 

2 

Since R remains so low, one infers that the true variance of 
the ratings is quite high and decreased very little by taking account 
of the variables we have measured. 

The variables films, belief scores, and institutions were found 
to be the important variables in the prediction of the CBRST rating 
(see Tables 20 and 21). Using these variables, the final prediction 
equation was fitted and is^given in Table 22. Using films as the only 
variable, one finds that R% the fractional part of the total variation 
in the ratings is .1357. By adding the variables positions and scores, 
R2 increases to .1791. Hence, of the three variables considered to be 
of importance, films again is the most useful. The low value of r 2 
leads one to believe that the true variability of the ratings is quite 
high and decreased very little by taking account of the variables 
measured. 

The CBRS rating is a combination of the CBRST and CBRSP ratings; 
therefore, one would expect to arrive at the same conclusions for this 
rating as for the CBRST and CBRSP. This is indeed the case. Films, 
occupation,, and belief scores are the best predictor variables for the 
CBRS rating (see Tables 23 and 24) . When the prediction equation is 
fitted (see Table 25) to these three variables, r 2 is found to be .1691, 
an increase of .0510 from the value R = .1181 obtained by fitting films 
only. 



It is noted that the high variability inherent in the CBRST and 
CBRSP is in evidence in the combined CBRS rating in the form of a low 
value of R^. 

Summary - CBRS Multiple Regression Model . Only twelve percent 
of the variance in the Classroom Behavior Rating Scale scores were ac- 
counted for by the filmed teaching episodes. Adding the other statis- 
tically significant variables, belief scores and occupational classifi- 
cation of the observers, only increased the variance that can be iden- 
tified to seventeen percent of the total variance. In contrast to the 
Teacher Practices Observation Record for which forty-eight percent of 
the variance in scores can be identified, the CBRS seems much less a 
reflection of the actual teaching behavior seen in the films. Although 
a large part of the variance in the observation scores is due to un- 
isolated variables and error, the filmed teaching behavior has a much 
greater effect on observation scores than on the rating scores. This 
would indicate that the observer- judges, in this situation at least, 
rate teachers on many criteria other than their professed beliefs about 
education and the actual classroom behavior of the teacher. 



40 



erJc 










as 



asasaaas 






mw w ?v 



m i mmmmmmMmm^mm > > * 



imm mm*** * * * » *mm 



II 




Table 19 

Final Model CBRSP 
Equation 

y 2 = 22.7161 + 7.5939 x^^ - 8.1812 x 2 - 4.3280 x 3 - 2.7585 x 4 
+ 0.0659 x 5 - 0.0203 x 6 + 0.0292 x ? 

+ 3.78683 x 12 + 1.4535 x 13 - 0.1421 x l4 



Analysis of Variance 



Source 


df 


ss 


MS 


F 


Regression 


10 


7,940.885 


i 

794.089 


9.010* 


Error 
s = 9.3879 


536 


47,239.698 


88.134 





*denotes significance at .05 level 



Summary 



Variables 


R 2 


Inc. in R^ 


Films 


'i 1 

.0978 


.0978 


Films and Occupation 


.1178 


.0200 


Films , Occupation & 


.1439 


.0261 


Belief Scores 










41 




aMMikili 



jaafiiiiMaMiMMi - 






42 



o 

t ERLC 



Q> 

> 

4 > 



O C 



4 -> 



0 ) 

u 

c 

<0 

o 



c 

0 * 



l/> 



in 

o 

o 

c r 
c; 

"O 



s 



* 



Li 



j 



Table 22 



Final Model CBRST 



Equa*' Lon 

y * 8.1052 - 2.9417 x ± - 3.9550 x £ - 1.9296 x 3 - 2.5529 x^ 
+ 0.0282 x 5 - 0.0062 x 6 + 0.0027 x ? 

+ 1.3782 x 12 + 0.4116 x 13 + 0.1595 x l4 



Analysis of Variance 



Source 


df 


ss 


MS 


F 


Regression 


i 

10 


1,367.681 


136.768 


11.695** 


Error 


536 


6,268.098 


11.694 




s - 3.4196 











**denotes significance at .01 level 



Summary 



Variables 


R 2 


Inc . in R^ 


Films 


-T- 1 

.1357 


.1357 


Films & Occupation 


.1500 


.0143 


Films, Occupation & 


.1791 


.0291 


Belief Scores 







Summary of F- tests to Determine 
Significance of Variables in the Model 







<U 

> 

o 



V 

JZ 



4 -* 



V 

o 

c 

Q 

u 



c 

CD 



CD 



CD 

<u 



O 

c 

o 

"O 



46 



f; ERiC 



»w J-.-IW 1 — l* W.M 



J 



Table 25 



Final Model CBRS 
Equation 

y 4 * 30.8213 - 10.5357 - 12.1362 x £ - 6.2576 x 3 - 5.3114 x 4 

+ 0.0941 x 5 - 0.0265 x fi + 0.0319 x y 
+ 5.1650 x 12 + 1.8651 x^ + 0.0174 x l4 



Analysis of Variance 



Source 


df 


ss 


MS 


F 


Regression 


i 

10 


15,481.511 


1,548.151 


1 

10.907** 


Error 

s *= 11.9137 


536 


76,078.139 


141.937 





**denotes signficance at .01 level 



Summary 



Variables 


R 2 


Inc. in R 2 


Films 


i 

.1181 


i 

.1181 


Films & Occupation 


.1388 


,0207 


Films, Occupation & 


.1691 


.0303 


Belief Scores 







47 
















ttrux **. .*¥+#■.** **1-* *■'■■■ ' 



CHAPTER IV 

PHASE II - OB SERVER- JUDGE RATINGS OF STUDENT TEACHERS 



Purposes 

The difficulty in making meaningful judgments of teacher com- 
petence centers squarely on obtaining hard evidence about what teachers 
and their students do in the classroom. Therefore, this study has two 
major purposes: 

1, To identify variables and combinations of variables which 
contribute significantly to variance in observations of the classroom 
behavior of student teachers. 

2. To identify variables (including observation scores) which 
contribute significantly to variance in the evaluation of classroom be- 
havior . 



Procedures 

A total of 569 observer-judges made 2,859 observations and 953 
evaluations of 407 student teachers from six teacher education institu- 
tions in California, Florida, Illinois, New York, and Wisconsin. Ob- 
servations of the classroom behavior of the students were made with the 
Teacher Practices Observation Record (TPOR) , a 62-item sign system which 
measures congruity of teaching methods with John Dewey's philosophy of 
exper iment alism . Judgments with respect to the quality of observed class- 
room behavior were made on the Teacher Ev aluation Scale (TES) , which is 
a simple form for rating teachers along a six-point competent-incompetent 
continu um on six general teacher characteristics. In addition, scores 
on the Personal Beliefs Inventory (PBI) and the Teacher Practic es 
Inventory (TPI) , which measure congruity of beliefs with Dewey’s experi- 
ment alism were obtained for each student teacher and observer- judge in 
an effort to assess the influence of their personal and educational 
philosophy on the observations and evaluations. 

The student teachers in this study were drawn from those engaged 
in their final pre-service clinical experience under the auspices of 
the six cooperating colleges and universities during the winter and 
spring terms of 1965. The observer- judges consisted of student teacher 
supervisors, education professors, and academic professors from all six 
colleges, plus cooperating teachers and principals from the public schools 
participating in the regular teacher education programs of these six 
institutions. 







*■£****» T^I^WWf^'t’' 






Ana3.ysis of Data - Using Total Scores of Observations and Evaluations 

The data were submitted to multiple regression analysis to dis- 
cover what measured variables contributed to the scores given by the 
observer- judges on the Teacher Practices Observation Record (TPOR) and 
the Teacher Evaluation Scale (TES). Each of these two scores were 
treated separately, and in turn, as the predicted response in a series 
of three increasingly complex regression models, first using 12 in- 
dependent variables, then 20, and finally a total of 69, including two- 
way interactions of these variables.. 

Several models were proposed in order to determine which variables 
contribute most to each response. The limitations due to the size of 
the computer memory banks and the tremendous number of variables in- 
volved in any evaluation were realized; independent variables could be 
introduced only as linear factors, and interactions between the vari- 
ables were explored only on a limited basis. The primary objective was 
to identify variables making significant contributions to the observa- 
tion and evaluation scores. 

"Significance" was determined in two ways: 

1) The significance of the regression itself is tested by a 
simple analysis of variance. Consider the general linear 
model: 

Y “ 6 0 + x i +....+ 3 k \ + e 

If it is desired to know whether the k independent 
variables account for a significant amount of response 
variance, we perform the test: 

HqS the model makes no contribution, i.e., 

3i = $2 = e k = 0 

vs. H a : the model makes a significant contribution. 

It can be shown that under the null hypothesis, H^: 

p _ SS (Regression) /k 

SS (Error) /{m - (k + 1)} 

has an F distribution. We thus reject H , i.e., conclude 
that the model does make a significant contribution in ac- 
counting for the variance of the response, if the calcu- 
lated F value exceeds the table F * w ^ ere a 

is the level of significance, is the degrees of free- 

dom for the regression (k in above example), and V2 is 
the degrees of freedom associated with error (m - (k + 1) 
above) . 

2) Once significance of regression has been determined, those 
variables which contribute most or contribute significantly 
must be determined. Suppose we wish to know whether . 



49 






o 






riM u*m 



variable Xi is significant — we then perform the test: 





is the estimator of the coefficient and Sj^ 

is its standard deviation, as the test statistic. Here, t 

is distributed as Student's t, and if |t| > t (a^) 

the conclusion is that variable i makes a significant 
contribution. ’ 



Certain disadvantages are inherent using the above method of 
analysis. There exists an increasingly large probability of making an 
incorrect decision as the number of variable coefficients tested in this 
manner increases, since the probability of rejecting Hq when it is true 
is a for each test. But this is irrelevant to our objective — the aim 
of the study is to screen out those independent variables whose contribu- 
tion seems to be insignificant. That we might include as significant 
some which, in fact, make little or no contribution is rather unimportant 
since each of these will be explored in greater depth, and, if not 
screened by this regression, will likely be caught in future analysis. 

Thus, variable coefficients whose calculated t-values are even 
close to being significant are noted for further study, and if a group 
of "dummy" variables is used to describe one particular factor, e.g., 
the five variables which describe the six institutions at which observa- 
tions originated, and include several which have significant or nearly 
significant t-values, the factor is "screened," and its contribution 
considered significant. 

Table 26 is a description of the variables considered in the in- 
vestigation. 

I. Response: TPOR Total Score 

Model 1: The general model was fitted with 10 independent variables: 



PBI 

TPI 



j observer 



2 variables 




2 variables 



6 two-way interactions of 
TPI and PBI of student- 
teacher and observer 



6 variables 



50 









Table 26 



Definition of Variables 









Factor 


No. of 
Variables 


Description 


Belief Scores of Observer 


2 


Personal Beliefs Inven- 
tory of Observer 

Teacher Practices 

Inventory of Observer 


Belief Scores of Student 
Teacher 


2 


Personal Beliefs Inven- 
tory of Student 
Teacher 

Teacher Practices 

Inventory of Student 
Teacher 


Occupational Classification 
of Observer 


4 


Cooperating Teacher 
Principal 
Clinical Professor 
Methods Professor or 
Educ. Prof. 
Academician 


Institutional Affiliation 


5 


Sacramento 
Albany 
Oneonta 
Wisconsin 
Northwestern 
> Florida 


Subject Hatter of Observer 


1 


Secondary, Elementary 


Age of Observer 


1 


— — — 


Sex of Observer 


1 


Female, Male 


Subject Matter of Student 
Teacher 


1 


Secondary, Elementary 


Age of Student Teacher 


1 


— — “ 


Sex of Student Teacher 


1 


Female, Male 



I 



3 > 




51 



■ V • . • • • 1 










■ ^ : <.-J. " ■■ • ! - ■■y-&W!^£m 



The F-value was 9.78, which is significant at the .01 
level. The regression model contributed a significant 
amount of information about the TPOR scores . 



Model 2: 



The general model was fitted with 19 independent variables 
They included the following: 



Position or occupation of observer 
Institution at which observation took place 



A vars. 
5 vars. 



o 

. ERIC 






Subject (Elem. or Sec 

Age 

Sex 



1 



of observer 



3 vars. 



Subj 

Age 

Sex 



ject) 

; ) 



of student teacher 



3 vars. 



observer 



2 vars. 



PBI) 

TPIJ stu< *ent teacher 



2 vars. 



All variables were included as linear terms, and no inter- 
actions were included. 



The calculated F-value was 9.17, again significant at the 
.01 level. Thus, the model accounts for a significant 
amount of response variance. 



Model 3: 



In addition to the above 19 independent variables. Model 3 
included all 2-way interactions which were considered 
pertinent. Limitations of the computer made it impossible 
to consider all 2-way interactions of these 19 variables, 
but the following 50 were included: 



Position x 

Sex - Observer 



4 vars. 



Position x 

Sex - Student Teacher 



Position x 

PBI - Observer 



Position x 

TPI - Observer 



4 vars. 
4 vars. 
4 vars. 



Institution x 

PBI - Student Teacher 



5 vars. 



52 



: :2Sfr 

mm 



Institution x 

TP I - Student Teacher 5 vars. 

Subject - Student Teacher * 

Subject - Observer 1 var. 

Subject - Judge x 

PBI and TPI - Observer 2 vars. 

Age - Judge x 

Age - Student Teacher 1 var. 

Age - Judge x 

PBI and TPI - Observer 2 vars. 

Sex - Judge x 

Sex and Age - Student Teacher 2 vars. 

Sex - Judge x 

PBI and TPI - Observer 2 vars. 

Age - Student Teacher x 

Sex and Subject - Student Teacher 2 vars. 

Age - Student Teacher x 

PBI and TPI - Student Teacher 2 vars. 

Sex - Student Teacher x 

PBI and TPI - Student Teacher 2 vars. 

Subject - Student Teacher x 

PBI and TPI - Student Teacher 2 vars. 



All 2-way interactions of PBI and 

TPI of Student Teacher and Observer 6 vars. 

An F-value of 4.49 resulted, significant at the .01 level. In 
addition, to test the significance of the interaction in terms as & 
group « the following test was conducted. 

are insignificant , i . e . , 



the 50 coefficients of the interaction 



a significant contribution. 
- SSE)/k 2 - k x 



HqS the interaction terms 



ft = g 
20 21 



= = 0 

69 



where B - B g9 
terms. 



represent 



H : the interactions make 

a 



Test Statistic: 



(SSE, 



SS^/^ 



where SSE and k are the sums of squares due to error and 
2 2 

degrees of freedom, respectively, associated with error in the 
reduced model (no interactions) , and SSE^ and k^ are similar 

notations for the complete model (interactions included) . 

Then F * (3895.16 - 3431.83)7(968 - 918) 

3431.83/918 

* 2.48 

Thus the interactions considered as a group make a significant 
contribution in accounting for response variance. 

II. Response: Evaluations of Student Teachers 

Model 1: The general model was fitted with 15 independent variables. 

These included the following: 

TPOR total score 1 var. 

PBI ) Judge and 

TPlJ Student Teacher 4 vars. 

All possible 2-way interactions of above 10 vars. 

The model was highly significant in its contribution to 
response variation, with an F-value of 25.63. 

Model 2: The general model was fitted with 20 independent variables, 

all introduced in linear form with no interactions included: 



Position of Judge 4 vars. 

Institution 5 vars. 

Subject, Age, and Sex of Judge 3 vars. 

Subject, Age, and Sex of Student Teacher 3 vars. 

TPOR score 1 var. 

PBI, TPI of Judge and Student Teacher 4 vars. 



The F-value was calculated to be 19.33, implying the 
regression model to be highly significant. 

Model 3: The general model was fitted with the 20 independent 

variables above plus 69 two-way interactions of these 
variables. The interactions included were: 

Position x 

Sex - Student Teacher 4 vars. 




54 



Position x 
Sex - Judge 


4 vars. 


Position x TPOR 


4 vars. 


Position x 

Beliefs - Judge 


8 vars. 


Institution x 
TPOR 


5 vars. 


Institution x 

Beliefs - Student Teacher 


10 vars . 


Subject - Judge x 
Subject - Student 


1 var. 


Subject - Judge x 
TPOR 


1 var. 


Subject - Judge x 
Beliefs - Judge 


2 vars. 


Age - Judge x 
TPOR 


1 var. 


Age - Judge x 

Beliefs - Judge 


2 vars. 


Sex - Judge x 
TPOR 


1 var. 


Sex - Judge x 

Beliefs - Judge 


2 vars. 


Sex - Judge x 

Sex - Student Teacher 


1 var. 


Sex - Judge x 

Age - Student Teacher 


1 var. 


Age - Student Teacher x 
Sex - Student Teacher 


1 var. 


Age - Student Teacher x 

Subject - Student Teacher 


1 var. 


Age - Student Teacher x 
TPOR 


1 var. 


Age - Student Teacher x 

Beliefs - Student Teacher 


2 vars. 



55 



Sex - Student Teacher x 

TPOR 1 var. 

Sex - Student Teacher x 

Beliefs - Student Teacher^ 2 varrs. 

Subject - Student Teacher x 

TPOR 1 var. 

Subject - Student Teacher x 

Beliefs — Student Teacher 2 vars . 

Age - Judge x 

Age - Student Teacher lvar. 

All 2-way interactions of TPOR, and PBX, 

TPI of Judge and Student Teacher 10 vars. 



An F-value of 6.11 resulted from the use of the above 
model, showing the regression^ contribution significant 
at the .01 level. Also, the test of interaction signifi- 
cance was run to determine whether the interaction terms, 
considered as a whole, make a contribution in accounting 
for evaluation variance. An F-value was calculated as 
follows : 

t? = (SSE 2 “ SSE 1 /(k 2 -_kj). 

SSE-j/k-L 

= (18979.86 - 16544.45)/ (967 - 898) 
16544.45/898 



* 1.93 

Thus, the interactions* contribution is significant at the 
.01 confidence level. 



Findings of Total Score A nalysis 

X. Variables which contribute significantly to the variance in 
observations of the classroom behavior of student teachers as measured 
by scores on the TPOR are shown in Table 27. 

IX. Variables which contribute significantly to the variance in 
evaluations of teaching competence as measured by TES scores are shown 
in Table 28. 



Discussion of Total Score Analysis 

I. What variables and combinations of variables contribute 
significantly to the variance in observations of the classroom behavior 
of student teachers as measured by scores on the TPOR? 




56 



Table 27 



Variables Which Contribute to Variance in 
Observation of Classroom Behavior 

Response: TPOR Total Scores 



Variable *t-value 



Model 1: Observer PBI-TPI interaction -4.90 

Student Teacher TPI-Observer PBI interaction 1.97 

Model 2: Student Teacher TPX 4.29 

Student Teacher PBX -2.35 

Age of Observer 4.37 

Occupation of Observer (4) 4.37 to -5.42 

Model 3: Observer Age x St. Teacher Age -1.76 

Observer Age x Observer TPX 2.19 

Observer Sex x Observer TPI 2.45 

St, Teacher Sex x St. Teacher PBX -2.23 

St, Teacher PBI x St. Teacher TPX 1.53 

Observer PBI x Observer TPI -2.39 

Observer Sex x Observer Occupation (4) 2.51 to -3.63 

Observer PBI x Observer Occupation (4) 1.62 to -2.28 

Observer TPI x Observer Occupation (4) 2„70 to -2.24 

Institution x St. Teacher TPX (5) 2.34 to -1.81 



*t-Value * 1.96, significant at .05 level 
t-value * 1.645, significant at .10 level 



57 



Table 28 



Variables Which Contribute to Variance In 
Evaluations of Teaching Competence 



Response: TES Total Scores 



Variable 



*t-value 



Model 1: 



Model 2: 



Model 3: 




TPOR x Judge TPI 




2.97 


Judge PB1 x Judge TPI 




-3.88 


Judge PBI x St. Teacher TPI 




1.98 


Judge TPI x St. Teacher PBI 




1.75 


TPOR 




13.44 


St. Teacher Sex 




-3.67 


Judge TPI 




-2.63 


judge PBI 




-1.94 


Student Teacher Age 




2i»23 


Judge Occupation (4) 


2.21 


to -2 f 43 


Institution (5) 


3.41 


to -2; 85 

i 


T*0R x Judge TPI 




2,24 


TPOR x St. Teacher PBI 




-1*99 


judge PBI x St. Teacher PBI 




-1.85 


judge TPI x St. Teacher PBI 




1.62 


judge Subject x Judge TPI 




-1.80 


Judge Sex x St* Teacher Sex 




3.26 


St. Teacher Age x St. Teacher Subject 




2.71 


St. Teacher Sex x St. Teacher TPI 




-1.81 


Judge Sex x Judge Occupation (4) 


2.09 


to -2.00 



*t- value * 1.96, significant at .05 level, 
t-value * 1.645, significant at .10 level 



58 



A. How do the beliefs of the student teacher influence the TPOR 
score received? When the PBI and” TPI of the student teacher are intro- 
duced linearly in Model 2, their relations to the TPOR score received 
seem to be conflicting in nature. While the TPX is in direct proportion 
tp the TPOR score, the PBI seems to be inversely proportional to the 
TPOR. This would seem to deliver a damaging blow to the value of the 
PBI as a predictor of observed behavior — except that we have had repeated 
experience with the "hidden power" of the PBI total score before and 
know it for what it is — a real sleeper. 

Model 1 indicates that high PBI-TPI scores of the observer- 
judge make the positive effect of a high student teacher TPI more 
pronounced, as does a high student teacher PBI. If a student teacher 
who has a high TPI also has a high PBI the probability of his receiving 
a high TPOR is sharply increased. Thus, the PBI acts something like an 
"additive" in the gas tank of the high TPI student teacher. It provides 
the "extra kick" that makes the difference. Furthermore, as these three 
scores (student teacher PBI, and the PBI-TPI of the observer- judges) 
decrease the student teacher's TPI relation to the TPOR decreases, and 
may even become inverse. Clearly, the evidence indicates that the com- 
plex relationships within the chemistry of the beliefs of both the stu- 
dent teacher and his observer exercise an influence upon the observation 
(TPOR) score which cannot be ignored. 

B. How do the beliefs of the observer- judge influence the TPOR 
scores given? The PBI and TPI of the observer- judge seem to enjoy a 
more straightforward relationship with the TPOR scores. The PBI of the 
observer— j udge is inversely proportional to the TPOR, with this effect 
seemingly becoming more pronounced as the TPI of the observer- judge in*- 
creases. As the TPI of the student teacher being observed increases 
the relationship between the observer- judge's PBI and the, TPOR becomes 
less inverse. In other words, it takes a student teacher with a high 
TPI to break down the propensity of a high PBI observer- judge for giving 
low TPOR scores. The TPI of the observer- judge also appears to be in- 
versely proportional to the TPOR given, but this effect is very slight 
unless the PBI of the observer- judge and the PBI of the student teacher 

are high. 

C. How do the beliefs of the student teacher and the observer- 
judge - interact with - other descriptive factors to influence TPOR scores.? 
When the beliefs scores of the students and observers and their inter- 
actions with various other variables were examined in Model 3, the TPI 
of the student teacher did not vary significantly in the nature of its 
previously stated relationships. A slightly significant interaction 

was found between his TPI and the institution he attended, but, this seems 
more relevant to a study aimed at the comparative evaluation of the 
cooperating institutions than to the central purposes of our investiga- 
tion. 

On the other hand, the PBI of the student teacher, which appeared 
in Model 1 to be negatively related to the TPOR, appears in Model 3 to 
act in conjunction with several other factors to relate positively with 
the TPOR. The" student teacher's PBI tends to become positively correlated 






59 




'wmm 



to the TPOR as his TPI score increases. This effect is even more pro- 
nounced if the teacher is female, and/or if the observer- judge has a low 
PBI score. Remember, low PBI observers tend generally to give high TPOR 
scores. They have difficulty differentiating experimental from non- 
experimental behavior in the classroom because they neither understand 
nor appreciate the underlying theoretical dimensions of the TPOR. When 
they "smell" a student who differs diametrically with them at the level 
of basic beliefs— a high PBI student teacher— they tend to "punish" him 
with a high TPOR score. Likewise, females tend generally to be "lukewarm" 
toward experiment alism, but when one of them decides to throw a high PBI 
in with a high TPI she becomes "red hot" and goes all the way in her 
enthusiasm for experimentalism and receives a very high TPOR. 

The TPI of the observer- judge generally has an inverse relation 
with the TPOR. However, the observer- judge with a high TPI tends to ] 

give even lower TPOR scores if she happens to be a relatively young 
female observer- judge with a high PBI score. 

A significant relation also seems to exist between the observer 
belief scores and his position (professional occupation). Detailed j 

analysis of this factor has already been reported elsewhere, showing | 

that college professors (both educationists and academicians) tend 
toward higher PBI scores than do students, cooperating teachers, princi- 
pals, and supervisors of student teachers. 

Personal characteristics of the observer- judges and student j 

teachers also seem to make certain significant contributions in ac- j 

counting for TPOR variance. Age of the observer seems to be in direct j 

proportion to the TPOR. As the observer- judge’s age increases so does j 

the TPOR score given. This effect is much less pronounced if the stu- j 

dent teacher is also older, but more profound if the "old" observer- J 

judge’s TPI is high. Older observer-judges— even those with high TPI J 

scores — tend to have low PBI scores. This acts to inflate the TPOR 
scores given by them, as persons (of any age) who are caught in a 
serious discrepancy between a high TPI and a low PBI seem to have 
trouble clearly differentiating experimental from non-experimental 
behavior on the TPOR. 

The most significant contributor among the measured personal 
characteristics was the sex of the student teacher. Generally, the 
male teacher seems to receive a higher TPOR than does a female, but this 
effect is much less pronounced, and possibly even slightly reversed, if 
the TPI and PBI of the female teacher are high. 

II. What variables and combinations- of variables contribute 
significantly to the variance in judgments made with respect to the 
quality of teaching which was observed? 



^Bob Burton Brown and Tom Rusk Vickery, "The Belief Gap in Teacher 
Education;" - The - Journal of- Teacher Education . 18:417-421, Winter, 1967. 



A. How does the observation (TPOR) score relate to judgments 

of teacher competence ( TES score )? By far the most significant pre- 
dictor of the evaluation given the teacher is the TPOR score. This 
relationship is a very pronounced positive correlation, with the evalua- 
tion having an even greater tendency to be favorable if the high TPOR 
score is accompanied by a high observer TPI, and low student teacher 
belief scores on both the PBI and TPI. These findings cut in opposite 
directions: (1) The more the teacher’s behavior is observed to be in 

agreement with Dewey (high TPOR) the more likely it is to be evaluated 
favorably (high TES), and vice versa, and (2) the lower the student 
teacher’s agreement with Dewey’s beliefs the more likely he is to get 

a favorable rating. However, there is a "catcher" to this statement 
to which the reader must be made aware. The TPOR scores tended to be 
low; i.e., on the whole the student teachers did not often employ class- 
room practices consistent with Dewey’s experimentalism. Therefore, 
relatively non-experimental teaching performances received favorable 
evaluations, although the more experimental these became the better 
their rating. 

B. How do the beliefs of the observer- judges influence their 
judgments of teacher competence? The PBI and TPI of the observer- judge 
seem to have a relationship with evaluation not dissimilar to that with 
the TPOR. Both are inversely proportional. In short, the higher the 
observer- judge’s expectations with respect to experimentalism (high PBI- 
TPI) the lower his estimation of the teaching observed (low TES). Like- 
wise, the lower the observer-judge's PBI-TPI, the higher his evaluations. 
However, the PBl’s inverse effect becomes less pronounced if the TPI of 
both the observer- judge and the student teacher are relatively low. 

The TPI of the observer- judge seems to be even more negative than 
tHe PBI in its correlation to the evaluation. But the negative effect 
becomes more emphatic if the TPI is accompanied by a high observer— judge 
PBI. The effect is somewhat modified if the TPOR score given is rela- 
tively high, and/or if the belief scores of the student teacher are 
relatively high. 

One might conclude that the observer- judges expected to ob- 
serve highly experimental behavior exhibited in the classroom by the 
student teachers, and, failing to see it, gave them poor ratings. The 
fact is, almost the opposite happened. Most of the observer- judges 
turned out to be ambivalent in their beliefs regarding Dewey’s experi- 
mentalism, observed classroom behavior which was in disagreement about 
twice as often as it was in agreement with the educational practices 
. advocated by Dewey, and evaluated that teaching on the whole as being 
"very good" to "excellent." 

C. How do the beliefs of the student teachers influence the 
ratings given their observed teaching behavior ? The belief scores of 
the student teachers seem 3 ess significant in their contribution to 
the evaluation than do those of the observer- judge. Of the two beliefs 
measures, the TPI of the student teacher is the more significant in 
predicting the evaluation, with a generally positive correlation between 
the TPI and the evaluation. This is more pronounced if the TPI of the 



61 











•tpjT. * 



’ ; ■ : v* 1 »oi%i- tys* * ■*• 



observer™ judge is high, and is modified somewhat if the observer- 
judge's PBI is high. 

The student teacher's PBI seems to enjoy a noticeable positive 
correlation with the evaluation only if the TPI of the observer- j udge 
is relatively high. Otherwise, the effect appears nil, or even slightly 
negative. When pertinent interactions are introduced in Model 3, evi- 
dence is yielded to indicate that a relatively high TPOR score or a 
; high observer- judge PBI score will tend to modify this effect of the 

student teacher PBI. 

D. How do other descriptive factors influence ratings of teacher 
competence ? Model 3 interactions show that the positive correlation 
between evaluation and student teacher TPI is more pronounced for a 
female teacher than for a male teacher. 

Possibly, the most interesting of the significant interactions 
is the one found between sex of the observer- judge and sex of the 
student teacher. The indication is that the rating given tends to be 
higher if the observer- judge and the student teacher are of the same 
sex. Also, an older student teacher seems to receive higher evalua- 
tions than a younger one, with this effect becoming even more pronounced 
for an elementary teacher than for a secondary teacher. 



Summary of Total Score Analysis 

In summary, the fact that this analysis has chipped only small 
pieces from a large block of data should be emphasized. That so many 
important, simple relationships were discovered is highly promising. 

Areas worthy of further study have been pinpointed, opening the way for 
the search for many heretofore obscure and subtle relationships involved 
in the problems of observing and judging classroom behavior. For example, 
a possible explanation for the often contradictory belief score relation- 
ships may be that the various factors within the instruments vary in the 
contribution to the TPOR and TES ratings given, thus presenting an ob- 
scure picture when treated as a single score, as we have done here. 

This determination of factor significance is reported next. 



• Re-examination of the Data Using Factor Scores 



The Personal Beliefs Inventory (PBI) , the Teacher Practices 
Inventory (TPI) , and the Teacher Practices Observation Record (TPOR) 
are comprised of two different types of factors: (1) "theoretical" 

factors, and (2) "empirical" factors. The theoretical factors are 
those which were extracted from the theoretical framework of John 
Dewey's experimentalism and built into the instruments from the begin- 
ning. The empirical factors are those which were identified as the 
result of submitting the empirical data collected in Phase I to factor 
analysis. Theoretical and empirical factors are listed and described 
in Appendix C. 



62 




- ^ 






Analysis of Data Using Factor Scores 

The theoretical and empirical factors were included in several 
multiple linear regressions. The purpose of these regressions was to 
further explore the results of previous regression models by breaking 
the total scores of the Personal Beliefs Inventory (PBI) , Teacher 
Practices Inventory (TPI) , and Teacher Practices Observation Record 
(TPOR) into factors. In this way it was hoped that some of the con- 
flicting conclusions drawn from the models using only total scores 
(see the regression analysis in the previous section of this chapter) 
could be explained. 

Four models were proposed. The first two treated the TPOR as 
the response, one employing the empirical factors of the PBI and TPI 
as independent variables, the second employing the theoretical factors 
of those instruments as independent variables. In the last two models 
the sum of the six Teacher Evaluation Scale (TES) scores was treated 
as the response; in the first the empirical factors of the PBI, TPI, 
and TPOR were the independent variables, and in the second the theor* 
retical factors of the same instruments were proposed as independent 
variables. 

In all models the variables were introduced only in a linear 
manner since the main objective was to determine basic observation- 
factor relationships; any accurate predictive equation would be inci- 
dental. It was found, however, that each of the models accounted for 
a significant percentage of the variance of the response. Statistical 
significance of a particular factor was determined by a simple t-test 
upon its coefficient in the regression equation. 

Table 29 lists the multiple R 2 and the F-value resulting from 
the test of the model's significance. 



Table 29 

F-Values of Theoretical and Empirical Factors 



Response 


Factors 


R 2 


F 




Theoretical 


.1248 


3.98** 


TPOR 


Empirical 


.1241 


5.48** 




Theoretical 


.3544 


12.20** 


TES 


Empirical 


.3320 


14.29** 



**Denotes significance at the .01 level. 



63 



o 









mm 






Findings of the Factor Scores 

The tables which follow list the factors which were significant 
at the .95 level of significance (t « 1.96) in descending order of their 
t-values. Those which were not significant at the .95 level, but which 
were at the .90 level (t - 1.645) are starred (*) . Included with the 
iiame of the factor is the name of the instrument to which the factor 
belongs, plus an indication whether the score belonged to the student 
teachers (ST) or the observer- judges (OJ) . A minus sign before the 
t-value indicates that the factor was inversely proportional to the 
regression response; without a minus sign, the factor score was in 
direct proportion to the response. 



Table 30 

Teacher Practices Observation Record (TPOR) Regression 

Theoretical Factors 



Instrument 


Factor 




t-value 


PBI 


T-6 


Knowing and Doing (OJ) 


-3.77 


PBI 


T-3 


Science and Morals (OJ) 


-3.46 


TPI 


T-ll 


Mechanical Following of an 
Established Method (ST) 


3.19 


PBI 


T-2 


Change and Certainty (OJ) 


2.71 


TPI 


T-9 


Reliance Upon Extrinsic 
Motivation (OJ) 


2.31 


TPI 


T-2 


Development of Challenging 
Problem (ST) 


2.20 


TPI 


T-l 


Situation of Experience (ST) 


2.09 


PBI 


T-4 


Emotions and Intellect (ST) 


-1.93* 


TPI 


T-3 


Generation of Ideas (ST) 


-1.76* 



Table 31 

Teacher Practices Observation Record (TPOR) Regression 

Empirical Factors 



Instrument 



PBI 

TPI 

PBI 

TPI 

TPI 

TPI 

PBI 

TPI 



Factor 



E-6 

E-l 

E-l 

E-l 

E-3 

E-3 

E-2 

E-5 



Nature of Learning (OJ) 

Evils in Education (OJ) 
Science and Morality (OJ) 
Evils in Education (ST) 
Hard-Nose Teacher (ST) 
Hard-Nose Teacher (OJ) 

Mind vs. Body & Emotions (OJ) 
Tough Problem (ST) 



t-value 



-5.92 

4.06 

-3.50 

2.89 

2.41 

-2.23 

2.20 

1.77** 



64 



o 

-ERIC 






mm 



w> 






Table 32 

Teacher Evaluation Scale (TES) Regression 
Theoretical Factors 




Instrument 


Factor 


t-value 


TPOR 


T-l Nature of the Situation 


7.02 


TPOR 


T-7 Motivation and Control 


5.17 


TPOR 


T-4 Use of Subject Matter 


3.93 


TPI 


T-5 Development of Reasoned Hypotheses (OJ) 


-3.23 


TPOR 


T-5 Evaluation 


3.10 


TPI 


T-9 Reliance Upon Extrinsic Motivation (OJ) 


-3.07 


TPOR 


T-2 Nature of the Problem 


-2.68 


PBI 


T-6 Knowing and Doing (ST) 


2.47 


PBI 


t-4 Emotions and Intellect (OJ) 


2.35 


TPI 


T-5 Development of Reasoned Hypotheses (ST) 


2.30 


TPOR 


T-3 Development of Ideas 


2.16 


PBI 


T-2 Change and Certainty 


2.11 


TPI 


T-3 Generation of Ideas 


2.10 




Table 33 






Teacher Evaluation Scale (TES) Regression 






Empirical Factors 




Instrument 


Factor 


t-value 


TPOR 


E-7 Pupil Activity 


8.18 


TPOR 


E-5 Subject Matter Quality 


5.32 


TPOR 


E-6 Generation and Testing of Hypotheses 


5.11 


PBI 


E-3 Knowledge for Its Own Sake (OJ) 


-3.49 


TPI 


E-l Evils in Education (OJ) 


-3.11 


PBI 


E-3 Knowledge for Its Own Sake (ST) 


2.83 


PBI 


E-5 Religion (ST) 


-2.39 


TPOR 


E-2 Rigidity - Teacher Control 


2.21 


TPI 


E-3 Hard-Nose Teacher (ST) 


2.12 


PBI 


E-5 Religion (OJ) 


-1.97 



65 

■| ■ • - ■ ... h .„ 1 i'W v >-*» fi-S4v.«V*-S. A: i', *V*»- “■* 1 ■ ' M *.» ..«•.«"►' • 

ERIC 






•mimm mm m * ********* 



T 

i 



Discussion of the Factor Scores 



I. What theoretical factors contribute significantly to variance 
in observations of the classroom behavior of student teachers as measured 
by scores on the TPOR? 

A. How do beliefs of the student teacher influence the TPOR score 
received ? The student teacher’s TPI beliefs on items involving the 
'Mechanical FolxDwing of an Established Method" (TP T Factor T-ll) 
exercised the strongest influence on the observational sccr< Teachers 
who said they believed in using lockstep methods got low TPOR scores; 
those who rejected such beliefs got high TPOR scores. Student teachers 
who agreed with Dewey that teachers should "Develop Challenging Problems" 
(TPI Factor T-2) and should involve pupils in "Situations of Experience" 
(TPI Factor T-l) received high TPOR scores, and vice versa. However, 
student teachers who agreed with Dewey that teachers should encourage 
pupils to "Generate Ideas" tended to get low TPOR scores. An easy 
explanation for this inverse relationship may be that most teachers say 
that creativity should be encouraged, but that relatively few do any- 
thing about it in the classroom. 

The only factor in the PBI scores of the student teachers which 
was found to contribute significantly to the TPOR score consisted of 
items relating to "Emotions and Intellect" (PBI Factor T-4) , and this 
was inverse. Could it be that these four items accounted for the con- 
fusing inverse relationship found between the TPOR score and the total 
PBI score in the earlier analysis? In any event, the other five PBI 
factors, which comprise ninety percent of the total PBI score failed 
to show the significant inverse relationship we got using the total 
score. This, of course, supports and more fully explains the "hidden 
power" of the PBI score. 

B . How do beliefs of the observer-j udge influence the TPOR 

score given? Observer- judges who were in agreement with Dewey on PBI 
beliefs involving the relationship of "Knowing and Doing" (PBI Factor 
T-6) , "Science and Morals" (PBI Factor T-3) tended to see teaching which 
is contrary to that advocated by Dewey, and vice versa. This inverse 
relationship to the TPOR score is consistent with the findings which 
resulted from the analysis of the total PBI scores of the observer- 
judges. Breaking the total PBI scores into theoretical factors, how- 
ever, did yield one new finding: beliefs involving "Change and Certainty" 

(PBI Factor T-2) exercise a direct or positive relationship on the ob- 
servational (TPOR) score. Apparently, relativistic observer-j udges are 
more tolerant of what they are willing to call experimental teaching 

than are observer-j udges who share a broader agreement with Dewey’s 
philosophy. This may also indicate that relativism is the easiest 
aspect of Dewey’s experimentalism to swallow, and is shared by some 
observer-j udges who are not otherwise of an experimental ralnd. 

Clearly, this examination shows that PBI factors exert a stronger 
influence on TPOR scores given by the observer-j udges than do TPI 
factors. Only items relating to "Reliance Upon Extrinsic Motivation" 

(TPI Factor T-9) contribute significantly to the influence of the TPI 
on the observational score. 



66 













mi 



t. V' 






■ ^ 

. J/ * r^p', ' *■ ;*» w*r*> • '■**' <* 



II. What empirical factors contribute significantly to variance 
in observations of the classroom behavior of student teachers as 
measured by scores on the TPOR? 

A. How do beliefs of therstudent teacher influence the TPOR : ' ; 

score received ? Student teachers who agree with John Dewey by rejecting 
the sixteen items in the "Evils in Education- -factor (TPI Factor E— 1) , 
and by also rejecting the three items characterizing the -"Hard^Nose ■- 
Teacher 55 (TPI Factor E-3) tend to receive high TPOR scores. Teachers 
who accept these same beliefs tend to receive low TPOR scores. Teachers 
who agree with Dewey by accepting the two statements making up the 
"Tough Problem" factor (TPI Factor E-5) are inclined to be seen as 
experimental, i.e.. receive high TPOR scores. These findings indicate 
that restrictive, orrective, or police-like tactics a ice not necessary 
in order to engage pupils with .substantial, challenging, or tough 
problems capable of stimulating thought. These data also seem to in- 
dicate that it iz the educational beliefs (TPI score) of the student 
teachers, rather than the philosophical beliefs (PBI score), which are 
most predictive of how they will be observed to teach (TPOR score) 

B. How do beliefs of the observer-judge influence , the TPOR 
score given? The more the observer- judge agrees with Dewey on the 
"Nature of Learning" (PBI Factor E-6) and on the "Science and Morality 
items (PBI Factor E-l) the more likely is he to give out low TPOR scores. 
Once again the inverse relationship between the observer- j udge’ s PBI 
score and the TPOR score is demonstrated, virtually duplicating the 
results obtained in the foregoing analysis of theoretical factors . How- 
ever, in the present analysis, we did find a direct relationship between 
the TPOR and the "Mind vs. Body and Emotions" factor (PBI Factor E-2) , 
showing again that an inverse PBI-TPOR relationship does not apply across 

the board. 

Incidentally, comparison of the magnitude of t-values indicates 
that the empirical factors of the PBI and TPI were more powerful pre- 
dictors of TPOR scores than were the original theoretical factors. 

III. What theoretical factors contribute significantly to the 
variance in observer- judge evaluations (TES scores) regarding the quality 
of teaching? 

A. How do TPOR factors relate to evaluation of teaching (TES 
score)? Five of the seven theoretical factors of the TPOR relate 
directly and positively to the TES score evaluations of teacher behavior. 
One factor did not turn out to be significant at all, and one factor, 
'Nature of the Problem" (TPOR Factor T-2) showed an inverse relationship 
with evaluation. Teachers who otherwise agree with Dewey’s experi- 
mentalism tend not to agree with him regarding the nature of problems to 
be dealt with by pupils in school. And even the most experimental of 
teachers rarely organize classroom activities around problems of the 
nature advocated by Dewey. Therefore, it is not surprising that this 
factor shows an inverse relationship with the TES score in this analysis. 



67 










Even so, these data corroborate our previous findings that the TPOR 
score is the most powerful single predictor of the evaluation score 
given the teacher. 

B - How do belief factors of the observer- judges influence the 
evaluations (TES scores) given the student teacher ? The more strongly 
the observer- judge believes in the "Development of Reasoned Hypotheses" 
(TPI Factor T-5) , the more likely he is to give student teachers a 
lower evaluation. Likewise, he tends to give a low evaluation score if 
he agrees with Dewey in condemning "Reliance Upon Extrinsic Motivation" 
(TPI Factor T-9). Apparently these two factors make the strongest con- 
tribution to the general inverse relationship between educational be- 
liefs (TPI) and evaluation of teacher competence (TES) found in the 
earlier analysis using total TPI scores. 

Whereas PBI Factor T-4, "Emotions and Intellect," had an inverse 
effect on the TPOR score, it has a direct effect on the evaluation (TES) 
score. This phenomenon seems to indicate that observer- judges who 
believe in the continuity of emotions and intellect see very little 
experimental teaching in classrooms, but that they tend to like what 
they see. It could be that the items in the "Emotions and Intellect" 
factor of the PBI attract people who are not otherwise in agreement 
with experimentalism, increasing the chances that non-experimental 
teaching will be given higher evaluations than if this factor attracted 
only thoroughgoing experimentalist observer- judges. 

C. How do belief factors of the student teachers influence the 
evaluations (TES scores) they receive ? Student teachers who have high 
scores on PBI Factor T— 6, "Knowing and Doing," and on TPI Factor T— 5, 
"Development of Reasoned Hypotheses," are likely to be given high 
evaluation scores, and vice versa. It is interesting to recall that 
observer- judges who hold high scores on the "Development of Reasoned 
Hypotheses" factor tend to give low evaluation scores. Apparently 
when they do encounter a student teacher who shares their experimental 
views on this factor they reward that teacher with a high evaluation. 

IV. What empirical factors contribute significantly to the 
variance in observer- judge evaluations (TES scores) regarding the 
quality of teaching? 

A. How do TPOR factors relate to evaluation of teaching (TES 
score )? Again, TPOR factors lead the way in influencing the evaluation 
score, directly and positively. Teachers who were seen to provide for 
a great deal of "Pupil Activity" in the classroom were given higher 
ratings (TES scores) than teachers who did not. Likewise, teachers who 
provided "Subject Matter Quality" of a challenging nature which went 
beyond regurgitation of textbook answers were given higher ratings than 
teachers who did not. Teachers who engaged pupils in activities cal- 
culated to "Generate and Test Hypotheses," or, in other words, those 
who were seen to teach in the "hypothetical mode," were rated as better 
teachers than those who did not. Teachers who refrained from exercising 
rigid and tight control of classroom activity got better ratings than 
those whose behavior was characterized by "Rigidity and Teacher Control." 



68 



>. %#>*.*♦**» **» v* -V* »K*j«^** 






-»*v* **?* **? **f'- 



B. How do belief factors of the observer- judges influence the 
evaluations (TES scores) given the student teachers ? Observer- judges 
who agreed with Dewey in his rejection of the notion of "Knowledge for 
Its Own Sake" (PBI Factor E-3) tended not to like the teaching they 
observed, presumably because they saw so much teaching which emphasized 
the acquisition of knowledge and skills as an end in itself. Non- 
experimental observer- judges who believed in "Knowledge for Its Own 
Sake" tended to give favorable ratings, presumably because they saw 
plenty of evidence of this kind of teaching. 

Observer- judges who agreed with Dewey in the rejection of conven- 
tional religious beliefs (PBI Factor E-5) tended to give lower ratings, 
and vice versa. 

The large "Evils in Education" factor (TPI Factor E-l) had a 
pronounced inverse relationship on the evaluation scores. Observer- 
judges who accepted these "evils" tended to like what they saw, those 
who rejected them did not. Presumably, there were plenty of these 
"evils" in evidence. 

C. How do belief factors of the student teachers influence 

the evaluations ( TES scores) they receive? Student teachers who agreed 
with Dewey in his rejection of "Knowledge for Its Own Sake" (PBI 
Factor E-3) received higher evaluations than those who disagreed with 
Dewey on this score. We should recall that observer- judges who held 
high scores (agreement with Dewey's experimentalism) on this same fac- 
tor tended to give low evaluations. Therefore, we may conclude that 
when they did encounter a student teacher who shared their views about 
"Knowledge for Its Own Sake" they rewarded that teacher with a favorable 
evaluation. 

A similar comparison can be made on the "Religion" factor (PBI 
Factor E-5). Student teachers who held conventional views on religion 
received high evaluations, and those who shared Dewey's experimental 
views on religion tended to receive low evaluations. It needs to be 
pointed out that the majority of observer- judges held conventional or 
non- experimental religious beliefs, as did the majority of student 
teachers. However, a substantial minority of the observer- judges 
strongly opposed traditional religious beliefs, and this group was most 
critical of the teaching they observed — regardless of the views of the 
student teachers on religion. They did not seem to reward (or recognize?) 
student teachers who shared their religious beliefs. 

Experimental student teachers who rejected beliefs that teachers 
should be "hard-nosed" (TPI Factor E-3) tended to get better ratings 
than those who believed the teacher should maintain strict disciplinary 
control. 



Summary of Factor Score Analysis 

By breaking scores of the belief and observation instruments into 
theoretical and empirical factors it has been possible to more fully 



69 



explore the interrelationships between behavior of observer- judges and 
the student teachers they observed and evaluated. The examination of 
factors has more clearly explicated the inverse relationships of the 
Personal Beliefs Inventory and the Teacher Practices Observation Record 
as well as identified the more powerful factors which influence both 
observation scores and evaluations. 

The factor analysis and multiple linear regression procedures 
have shared in elucidating some of the very complex interactions between 
beliefs, observations and evaluations. Also they have clearly identified 
some classroom practices which nearly all observer- judges see as good 
and reward with high evaluations. The teacher who focuses classroom 
attention upon the pupils and their activities, who grants students op- 
portunities for freedom of expression, who develops lessons which go 
beyond the simple acquisition of material presented in textbooks and 
who encourages self-discipline and internal motivation for learning 
is the teacher that observer- judges will evaluate highly. Most observer- 
judges, although of varying occupations and belief systems, value these 
teacher behaviors as good when they see them. 



70 




... !'* wi .-M'.Mi' » »•*«• ■ ' ' tfiT. V -r.- -v *7 ' -* ••• . i?- ■* :«v- .• v ■•■—• ■■•■*■ J ■ 



CHAPTER V 

PHASE III (FOLLOW-UP STUDY) 



Purposes 

The objectives of Phase III were to: 

1. Follow up the student- teacher subjects of Phase II during 
their first year of service as certified teachers through continued 
observations and evaluations by observer- judges . 

2. Compare the observations and evaluations of the classroom 
behavior of first-year teachers with those of experienced teachers. 

3. Compare the belief scores, classroom observation scores, 
and evaluations of the subjects as student teachers with those of 
the subjects as first-year teachers. 



Procedures 



Selection of the Sample . From the total of 407 student teachers 
who had served as observer-subjects from Phase II, 100 subjects were 
chosen for study in Phase III. The 100 subjects were selected on the 
basis of (1) their status as a certified, employed teacher, and (2) 
the completeness of the data collected on them from Phase II. In addi- 
tion, 100 experienced teachers-*- were added to the sample. Each of the 
experienced teachers was selected randomly from the faculty of the 
school in which the first-year teacher was serving. Thus, each first- 
year teacher was matched with an experienced teacher, chosen randomly, 
who was employed in the same school and who taught at the same grade 
level or in the same general subject matter area. 



The 300 observer- judges of Phase III were selected from the same 
school systems as the observees. This group consisted of principals, 
supervisors, central office pesonnel, and classroom teachers. Each pair 
of teachers, first-year and experienced, were repeatedly observed and 
evaluated by (1) the principal of the school in which they served, (2) 
another principal serving in the same district , and (3) a member of the 
supervisory or teaching staff of the school system. Thus, each pair of 
subjects were repeatedly observed by three observer-judges. In addition, 
three observer- judges from Phase II, university personnel, also observed 
and judged a portion of the Phase III subjects. 



Mean years of experience was 6.97 



Altogether 303 observer- judges observed and evaluated 200 class- 
room teachers for a total of x,892 observations and ratings. 



Data Collection 

Every subject (teacher and observer- judge) completed the Study of 
Beliefs (including the PBI, TPI, and the D-Scale) prior to the observation 
period. Three observer— judges individta .ly observed and evaluated each 
pair of teachers three times, using the Teacher Practices Observation 
Record (TPOR) and the T eacher Evaluation Scale (TES). Observations were 
scheduled for the first three months of 1967; each teacher was visited 
by the three observer- judges in January, again in February and finally 
in March. In total, nine observations and evaluations of competence 
were made for each teacher-subject. 



Analysis of Data 

The scores given the teachers by the observer- judges on the TPOR 
and TES were submitted to multiple regression analysis in order to 
identify variables which contributed to the variance of observation and 
evaluation scores for both first-year and experienced teachers. Each 
of these two scores were treated separately, and in turn, in a series 
of increasingly complex regression models. 

Teacher Practices Observation Record . Four models were proposed 
in order to determine which variables contribute most to the TPOR scores, 
which served as the response , 

Model 1: This general model was fitted with 21 variables: 



PBI 
TPI . 
POQ, 



observer- j udge 



3 vars. 



PBI) 

TPI ( teacher 3 vars. 

poqJ 

15 two-way interactions of PBI, TPI, and 

POQ '>f teacher and observer 15 vars. 



The F-value was 7.00 which is significant at the .01 
level. This regression model contributed a signficant 
amount, of information about the TPOR scores. 



Model 2: This general model was fitted with 40 variables. The 

personal characteristics of observer and teacher were 
brought into the model as main effects, their interactions 
were not included. The 21 variables from Model 1 were in- 
cluded with the personal characteristic variables to produce 



72 




the following: 



Observer Occupation 4 vars. 

Time of Observation 2 vars. 

Institution 5 vars. 

Length of Service of Teacher 1 var. 

Teaching Level 1 var. 

Subject Matter Taught 6 vars. 

Variables from Model 1 21 vars.. 



The calculated F-value was 9.17, again significant at the 
.01 level. Thus, the model accounts for a significant 
amount of response variance. 

Model 3: Model 3 was a 39-variable model which was designed to 

identify relationships between teacher and observer 
characteristics as they interact with belief scores in 
accounting for variance of the Teacher Practices Observa- 
tion Record scores. The list of variables follows: 



Date of Observation 2 vars. 

Length of Service of Teacher 1 var. 

Teaching Level 1 var. 

Beliefs of Observer-judge 3 vars. 

Beliefs of Teacher 3 vars . 

Date x Occupation of Observers 2 vars. 

Date x Teaching Level 2 vars. 

Occupation of Observers x Teaching Level 1 var. 

Date x Observers' Beliefs 6 vars. 

Date x Teachers' Beliefs 6 vars. 

Occupation of Observers x Observers' Beliefs 3 vars. 

Occupation of Observers x Teachers' Beliefs 3 vars. 

Teaching Level x Observers' Beliefs 3 vars. 

Teaching Level x Teachers ' Beliefs 3 vars . 



The F-value of the model was 4.06, which indicates this to 
be significant at the .01 level; thus the variables make a 
significant contribution in accounting for response variance. 

Model 4: This final model was designed to explore the relationships 

of the belief scores of the observer- judge and the teacher 
with thv. subject matter taught in accounting for variance 
of the TPOR. The variables were as follows : 

Subject Matter 6 vars. 

Observers' Beliefs 3 vars. 

Teachers' Beliefs 3 vars. 

Subject Matter x Beliefs 36 vars. 

This model had an F-value of 4.39, significant at the .01 

level. 



73 



Findings 






fc-W-^Kv li^ W ^ WifcailCfjt. 



I. Variables which contribute significantly to the variance in 
observations of the classroom behavior of teachers as measured by the 
scores on the Teacher Practices Observation Record are shown in Table 
34. 



<K 

II, Variables which contribute significantly to the variance in 
evaluations of teacher competence as measured by TES scores are shown in 
Table 35. 



Discus sic a 



I. What variables and combinations of variables contribute 
significantly to the variance in observations of the classroom behavior 
of teachers as measured by scores on the TPOR? 

A. How do the beliefs of the teacher influence the TPOR score 
received ? Model 1 indicates that the teacher’s PBI score has a slightly 
positive effect on the teacher's observed classroom behavior (TPOR 
rcore) if his D-Scale score is high (open-minded) and the observer- 
judge’s D-Scale score is low (closed-minded). However, as the teacher’s 
D-Scale score decreases and the observer-judge’ s increases, the effect 
of the teacher’s PBI on the TPOR becomes negative in nature. 

Model 1 also indicates that the teacher's TPI score is directly 
related to the TPOR score given the teacher’s observed teaching behavior. 
This relationship is even more pronounced if the jacher is also in high 
agreement with experimentalism on the PBI. 

These data pretty much corroborate the findings in Phase II 
(pp. 59 and 67) . The teacher’s beliefs about specific classroom prac- 
tices (TPI score) are more predictive of the teacher’s observed class- 
room behavior (TPOR score) than are beliefs about the fundamental 
questions of philosophy (PBI score) . 

In general, the teacher’s D-Scale score and the TPOR are posi- 
tively correlated, with this relationship being more pronounced if the 
PBI is high and the TPI low, but becomes modified as these scores 
reverse themselves. Again, the PBI seems to provide ’’extra kick" to 
the influence of the D-Scale on the TPOR, just as it does for the in- 
fluence of the TPI on the TPOR. 

B. How do beliefs of the observer- judge influence the TPOR 
score given ? Again (as in Phase II, p. 59) it was found from Model 1 
that the observer- judge PBI has slightly negative effect on the TPOR. 
However, this is the case only when the observer- judge TPI score is low. 
As the observer- judge TPI score increases the correlation becomes in- 
creasingly positive. 

The effect of the observer- j udge's TPI is generally positive, 
with this effect very pronounced if his PBI and D-Scale scores are high. 
(Remember, in this study a high D-Scale score indicates open-mindedness.) 






»' ;»;• Sjp. 



K 



1 *' 






ir*«.. .- 




Table 34 

Variables Which Contribute to Variance In 
Observation of Classroom Behavior 



Response: TPOR Total Scores 







Variable 


t-value 


Model 


1: 


R 2 * .0729 (F * 7.00**) 








T's PBI x T's D-Scale 


2.98** 






T's TPI x T's D-Scale 


-2.96** 






O's PBI x O's TPI 


2.16** 






0's TPI x O's D-Scale 


1.79* 






O's D-Scale x T's PBI 


-1.69* 






O's PBI x O's D-Scale 


-1.47 






O's D-Scale x T's D-Scale 


-1.46 


Model 


2: 


R 2 * .1169 (F - 6.12**) 








Position - 1 


1.46 






2 


1.09 






3 


4.45** 






4 


-1,98** 






Institution - 1 


2.12** 






2 


-2.54** 






3 


3.85** 






4 


-1.58 






5 


3.35** 






Grade Level - 


-3.38** 






Subject - 1 


0.22 






2 


-0.06 






3 


-1.73* 






4 


1.16 






5 


-2.41** 






6 


-2.49** 






T's TPI x T's D-Scale 


-3.05** 






T's PBI x T's D-Scale 


2.41** 






O's D-Scale x T's PBI 


-2.32** 






O's TPI x O's D-Scale 


2.13** 






O's PBI x O's TPI 


1.36 


Model 


3: 


R 2 = .078/ (F = 4.06**) 








Teaching Level x T's PBI 


-3.41** 






Teaching Level x O's D-Scale 


2.91** 






T's Experience (New or Experienced) x T's PBI -1.74* 






T's Experience x T's D-Scale 


-1.54 






O's TPI 


3.72** 






T's TPI 


3.01** 






O's PBI 


-2.06** 



** Significant at the .05 level 
* Significant at the .10 level 

Note: T's « Teachers or Observers 

O's » Observers 



75 





Table 35 



Variables Which Contribute to Variance In 
Evaluations of Teaching Competence 

Response: TES Total Scores 



Variable t-value 

Model 1: R 2 ■ .2741 (F - 25.12**} 

0*s D-Scale x T*s TPI -4.20** 

T*s PBI x T*s D-Scale -3.73** 

0*s D-Scale x TPOR 2.99** 

O's D-Scale x T's PBI 2.51** 

0's TPI x T's PBI -2.51** 

0 f s TPI x O's D-Scale 2.23** 

T's TPI x TPOR 2.26** 

0*s TPI x T’s D-Scale -1.89* 

0 f s TPI x T*s TPI 1*83* 

0 f s PBI x T's PBI -1.63 

T's D-Scale x TPOR -1.53 

Model 2: R 2 * .3399 (F - 20.20**) 

. Position - 1 0.96 

2 -1.67* 

3 1.33 

4 0.89 

Institution - 1 2.14** 

2 4.71** 

3 1.66* 

4 -2.81** 

5 -3.03** 

T*s Position - -8* 71** 

Subject 1 -1.23 

2 0.93 

3 -2.95** 

4 -3.41** 

5 5.33** 

6 - 2 . 01 ** 

0’s D*- Scale x T*s PBI -4.22** 

T*s PBI x T*s D-Scale -3.32** 

0’s D-Scale x TPOR 2.90** 

0*s TPI x 0*s D-Scale 2.79** 

0*s D-Scale x T*s PBI 2.46** 

0*s TPI x T's PBI -2.14** 

0*s TPI x T*s D-Scale -1.74* 

0’s PBI x T’s TPI 1.70* 

T’s TPI x T’s D-Scale -1.66* 

T’s D-Scale x TPOR -1.41 

T’s TPI x TPOR 1.37 




76 



